Part I - PISA Data Exploration¶

by Alham Hotaki¶

Introduction¶

PISA is a survey that examines students from compulsory education on how well prepared they are for life after school. This investigation focuses on the PISA Survey from 2012, with data belonging to around 500K students from 65 different countries.

Preliminary Wrangling¶

In [178]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
from IPython.display import display
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.ticker import FuncFormatter
from scipy import stats
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

%matplotlib inline
In [183]:
# failed to read file as utf-8. changed to ISO-8859-1 instead.
df_pisa=pd.read_csv('pisa2012.csv', encoding = "ISO-8859-1")
C:\Users\LENOVO\anaconda3\lib\site-packages\IPython\core\interactiveshell.py:3194: DtypeWarning: Columns (15,16,17,21,22,23,24,25,26,30,31,36,37,45,65,123,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,475) have mixed types.Specify dtype option on import or set low_memory=False.
  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
In [3]:
df_pisa.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 485490 entries, 0 to 485489
Columns: 636 entries, Unnamed: 0 to VER_STU
dtypes: float64(250), int64(18), object(368)
memory usage: 2.3+ GB
In [4]:
df_pisa.shape
Out[4]:
(485490, 636)
In [6]:
df_pisa_dict=pd.read_csv('pisadict2012.csv', encoding = "ISO-8859-1")
In [7]:
pd.options.display.max_rows = len(df_pisa)
pd.options.display.max_columns = len(df_pisa.columns)
In [173]:
print(df_pisa_dict).head()
      Unnamed: 0                                                  x
0            CNT                           Country code 3-character
1       SUBNATIO  Adjudicated sub-region code 7-digit code (3-di...
2        STRATUM  Stratum ID 7-character (cnt + region ID + orig...
3           OECD                                       OECD country
4             NC                       National Centre 6-digit Code
5       SCHOOLID  School ID 7-digit (region ID + stratum ID + 3-...
6        STIDSTD                                         Student ID
7        ST01Q01                                International Grade
8        ST02Q01                           National Study Programme
9        ST03Q01                                      Birth - Month
10       ST03Q02                                        Birth -Year
11       ST04Q01                                             Gender
12       ST05Q01                                   Attend <ISCED 0>
13       ST06Q01                                   Age at <ISCED 1>
14       ST07Q01                                 Repeat - <ISCED 1>
15       ST07Q02                                 Repeat - <ISCED 2>
16       ST07Q03                                 Repeat - <ISCED 3>
17       ST08Q01                          Truancy - Late for School
18       ST09Q01                    Truancy - Skip whole school day
19      ST115Q01           Truancy - Skip classes within school day
20       ST11Q01                                   At Home - Mother
21       ST11Q02                                   At Home - Father
22       ST11Q03                                 At Home - Brothers
23       ST11Q04                                  At Home - Sisters
24       ST11Q05                             At Home - Grandparents
25       ST11Q06                                   At Home - Others
26       ST13Q01                          Mother<Highest Schooling>
27       ST14Q01            Mother Qualifications - <ISCED level 6>
28       ST14Q02           Mother Qualifications - <ISCED level 5A>
29       ST14Q03           Mother Qualifications - <ISCED level 5B>
30       ST14Q04            Mother Qualifications - <ISCED level 4>
31       ST15Q01                          Mother Current Job Status
32       ST17Q01                          Father<Highest Schooling>
33       ST18Q01            Father Qualifications - <ISCED level 6>
34       ST18Q02           Father Qualifications - <ISCED level 5A>
35       ST18Q03           Father Qualifications - <ISCED level 5B>
36       ST18Q04            Father Qualifications - <ISCED level 4>
37       ST19Q01                          Father Current Job Status
38       ST20Q01              Country of Birth International - Self
39       ST20Q02            Country of Birth International - Mother
40       ST20Q03            Country of Birth International - Father
41       ST21Q01                Age of arrival in <country of test>
42       ST25Q01                     International Language at Home
43       ST26Q01                                 Possessions - desk
44       ST26Q02                             Possessions - own room
45       ST26Q03                          Possessions - study place
46       ST26Q04                             Possessions - computer
47       ST26Q05                             Possessions - software
48       ST26Q06                             Possessions - Internet
49       ST26Q07                           Possessions - literature
50       ST26Q08                               Possessions - poetry
51       ST26Q09                                  Possessions - art
52       ST26Q10                            Possessions - textbooks
53       ST26Q11          Possessions - <technical reference books>
54       ST26Q12                           Possessions - dictionary
55       ST26Q13                           Possessions - dishwasher
56       ST26Q14                                Possessions - <DVD>
57       ST26Q15                     Possessions - <Country item 1>
58       ST26Q16                     Possessions - <Country item 2>
59       ST26Q17                     Possessions - <Country item 3>
60       ST27Q01                         How many - cellular phones
61       ST27Q02                             How many - televisions
62       ST27Q03                               How many - computers
63       ST27Q04                                    How many - cars
64       ST27Q05                    How many - rooms bath or shower
65       ST28Q01                             How many books at home
66       ST29Q01                      Math Interest - Enjoy Reading
67       ST29Q02      Instrumental Motivation - Worthwhile for Work
68       ST29Q03            Math Interest - Look Forward to Lessons
69       ST29Q04                        Math Interest - Enjoy Maths
70       ST29Q05  Instrumental Motivation - Worthwhile for Caree...
71       ST29Q06                         Math Interest - Interested
72       ST29Q07  Instrumental Motivation - Important for Future...
73       ST29Q08       Instrumental Motivation - Helps to Get a Job
74       ST35Q01   Subjective Norms -Friends Do Well in Mathematics
75       ST35Q02  Subjective Norms -Friends Work Hard on Mathema...
76       ST35Q03  Subjective Norms - Friends Enjoy Mathematics T...
77       ST35Q04  Subjective Norms - Parents Believe Studying Ma...
78       ST35Q05  Subjective Norms - Parents Believe Mathematics...
79       ST35Q06        Subjective Norms - Parents Like Mathematics
80       ST37Q01     Math Self-Efficacy - Using a <Train Timetable>
81       ST37Q02       Math Self-Efficacy - Calculating TV Discount
82       ST37Q03  Math Self-Efficacy - Calculating Square Metres...
83       ST37Q04  Math Self-Efficacy - Understanding Graphs in N...
84       ST37Q05            Math Self-Efficacy - Solving Equation 1
85       ST37Q06             Math Self-Efficacy - Distance to Scale
86       ST37Q07            Math Self-Efficacy - Solving Equation 2
87       ST37Q08  Math Self-Efficacy - Calculate Petrol Consumpt...
88       ST42Q01     Math Anxiety - Worry That It Will Be Difficult
89       ST42Q02              Math Self-Concept - Not Good at Maths
90       ST42Q03                      Math Anxiety - Get Very Tense
91       ST42Q04               Math Self-Concept- Get Good <Grades>
92       ST42Q05                    Math Anxiety - Get Very Nervous
93       ST42Q06                  Math Self-Concept - Learn Quickly
94       ST42Q07           Math Self-Concept - One of Best Subjects
95       ST42Q08                       Math Anxiety - Feel Helpless
96       ST42Q09      Math Self-Concept - Understand Difficult Work
97       ST42Q10   Math Anxiety - Worry About Getting Poor <Grades>
98       ST43Q01  Perceived Control - Can Succeed with Enough Ef...
99       ST43Q02  Perceived Control - Doing Well is Completely U...
100      ST43Q03    Perceived Control - Family Demands and Problems
101      ST43Q04             Perceived Control - Different Teachers
102      ST43Q05  Perceived Control - If I Wanted I Could Perfor...
103      ST43Q06      Perceived Control - Perform Poorly Regardless
104      ST44Q01  Attributions to Failure - Not Good at Maths Pr...
105      ST44Q03  Attributions to Failure - Teacher Did Not Expl...
106      ST44Q04              Attributions to Failure - Bad Guesses
107      ST44Q05        Attributions to Failure - Material Too Hard
108      ST44Q07  Attributions to Failure - Teacher Didnt Get St...
109      ST44Q08                  Attributions to Failure - Unlucky
110      ST46Q01       Math Work Ethic - Homework Completed in Time
111      ST46Q02            Math Work Ethic - Work Hard on Homework
112      ST46Q03               Math Work Ethic - Prepared for Exams
113      ST46Q04           Math Work Ethic - Study Hard for Quizzes
114      ST46Q05  Math Work Ethic - Study Until I Understand Eve...
115      ST46Q06         Math Work Ethic - Pay Attention in Classes
116      ST46Q07                Math Work Ethic - Listen in Classes
117      ST46Q08  Math Work Ethic - Avoid Distractions When Stud...
118      ST46Q09              Math Work Ethic - Keep Work Organized
119      ST48Q01  Math Intentions - Mathematics vs. Language Cou...
120      ST48Q02  Math Intentions - Mathematics vs. Science Rela...
121      ST48Q03  Math Intentions - Study Harder in Mathematics ...
122      ST48Q04  Math Intentions - Take Maximum Number of Mathe...
123      ST48Q05  Math Intentions - Pursuing a Career That Invol...
124      ST49Q01     Math Behaviour - Talk about Maths with Friends
125      ST49Q02           Math Behaviour - Help Friends with Maths
126      ST49Q03        Math Behaviour - <Extracurricular> Activity
127      ST49Q04       Math Behaviour - Participate in Competitions
128      ST49Q05  Math Behaviour - Study More Than 2 Extra Hours...
129      ST49Q06                        Math Behaviour - Play Chess
130      ST49Q07              Math Behaviour - Computer programming
131      ST49Q09          Math Behaviour - Participate in Math Club
132      ST53Q01  Learning Strategies- Important Parts vs. Exist...
133      ST53Q02  Learning Strategies- Improve Understanding vs....
134      ST53Q03  Learning Strategies - Other Subjects vs. Learn...
135      ST53Q04  Learning Strategies - Repeat Examples vs. Ever...
136      ST55Q01                Out of school lessons - <test lang>
137      ST55Q02                    Out of school lessons - <maths>
138      ST55Q03                  Out of school lessons - <science>
139      ST55Q04                      Out of school lessons - other
140      ST57Q01                Out-of-School Study Time - Homework
141      ST57Q02         Out-of-School Study Time - Guided Homework
142      ST57Q03          Out-of-School Study Time - Personal Tutor
143      ST57Q04      Out-of-School Study Time - Commercial Company
144      ST57Q05             Out-of-School Study Time - With Parent
145      ST57Q06                Out-of-School Study Time - Computer
146      ST61Q01  Experience with Applied Maths Tasks - Use <Tra...
147      ST61Q02  Experience with Applied Maths Tasks - Calculat...
148      ST61Q03  Experience with Applied Maths Tasks - Calculat...
149      ST61Q04  Experience with Applied Maths Tasks - Understa...
150      ST61Q05  Experience with Pure Maths Tasks - Solve Equat...
151      ST61Q06  Experience with Applied Maths Tasks - Use a Ma...
152      ST61Q07  Experience with Pure Maths Tasks - Solve Equat...
153      ST61Q08  Experience with Applied Maths Tasks - Calculat...
154      ST61Q09  Experience with Applied Maths Tasks - Solve Eq...
155      ST62Q01  Familiarity with Math Concepts - Exponential F...
156      ST62Q02           Familiarity with Math Concepts - Divisor
157      ST62Q03  Familiarity with Math Concepts - Quadratic Fun...
158      ST62Q04                       Overclaiming - Proper Number
159      ST62Q06   Familiarity with Math Concepts - Linear Equation
160      ST62Q07           Familiarity with Math Concepts - Vectors
161      ST62Q08    Familiarity with Math Concepts - Complex Number
162      ST62Q09   Familiarity with Math Concepts - Rational Number
163      ST62Q10          Familiarity with Math Concepts - Radicals
164      ST62Q11                 Overclaiming - Subjunctive Scaling
165      ST62Q12           Familiarity with Math Concepts - Polygon
166      ST62Q13                Overclaiming - Declarative Fraction
167      ST62Q15  Familiarity with Math Concepts - Congruent Figure
168      ST62Q16            Familiarity with Math Concepts - Cosine
169      ST62Q17   Familiarity with Math Concepts - Arithmetic Mean
170      ST62Q19       Familiarity with Math Concepts - Probability
171      ST69Q01                Min in <class period> - <test lang>
172      ST69Q02                    Min in <class period> - <Maths>
173      ST69Q03                  Min in <class period> - <Science>
174      ST70Q01            No of <class period> p/wk - <test lang>
175      ST70Q02                No of <class period> p/wk - <Maths>
176      ST70Q03              No of <class period> p/wk - <Science>
177      ST71Q01                    No of ALL <class period> a week
178      ST72Q01  Class Size - No of Students in <Test Language>...
179      ST73Q01        OTL - Algebraic Word Problem in Math Lesson
180      ST73Q02              OTL - Algebraic Word Problem in Tests
181      ST74Q01               OTL - Procedural Task in Math Lesson
182      ST74Q02                     OTL - Procedural Task in Tests
183      ST75Q01           OTL - Pure Math Reasoning in Math Lesson
184      ST75Q02                 OTL - Pure Math Reasoning in Tests
185      ST76Q01        OTL - Applied Math Reasoning in Math Lesson
186      ST76Q02              OTL - Applied Math Reasoning in Tests
187      ST77Q01             Math Teaching - Teacher shows interest
188      ST77Q02                         Math Teaching - Extra help
189      ST77Q04                      Math Teaching - Teacher helps
190      ST77Q05                  Math Teaching - Teacher continues
191      ST77Q06                   Math Teaching - Express opinions
192      ST79Q01    Teacher-Directed Instruction - Sets Clear Goals
193      ST79Q02  Teacher-Directed Instruction - Encourages Thin...
194      ST79Q03  Student Orientation - Differentiates Between S...
195      ST79Q04     Student Orientation - Assigns Complex Projects
196      ST79Q05              Formative Assessment - Gives Feedback
197      ST79Q06  Teacher-Directed Instruction - Checks Understa...
198      ST79Q07  Student Orientation - Has Students Work in Sma...
199      ST79Q08  Teacher-Directed Instruction - Summarizes Prev...
200      ST79Q10   Student Orientation - Plans Classroom Activities
201      ST79Q11  Formative Assessment - Gives Feedback on Stren...
202      ST79Q12  Formative Assessment - Informs about Expectations
203      ST79Q15  Teacher-Directed Instruction - Informs about L...
204      ST79Q17     Formative Assessment - Tells How to Get Better
205      ST80Q01  Cognitive Activation - Teacher Encourages to R...
206      ST80Q04  Cognitive Activation - Gives Problems that Req...
207      ST80Q05  Cognitive Activation - Asks to Use Own Procedures
208      ST80Q06  Cognitive Activation - Presents Problems with ...
209      ST80Q07  Cognitive Activation - Presents Problems in Di...
210      ST80Q08   Cognitive Activation - Helps Learn from Mistakes
211      ST80Q09       Cognitive Activation - Asks for Explanations
212      ST80Q10       Cognitive Activation - Apply What We Learned
213      ST80Q11  Cognitive Activation - Problems with Multiple ...
214      ST81Q01       Disciplinary Climate - Students Don’t Listen
215      ST81Q02          Disciplinary Climate - Noise and Disorder
216      ST81Q03  Disciplinary Climate - Teacher Has to Wait Unt...
217      ST81Q04    Disciplinary Climate - Students Don’t Work Well
218      ST81Q05  Disciplinary Climate - Students Start Working ...
219      ST82Q01  Vignette Teacher Support -Homework Every Other...
220      ST82Q02  Vignette Teacher Support - Homework Once a Wee...
221      ST82Q03  Vignette Teacher Support - Homework Once a Wee...
222      ST83Q01  Teacher Support - Lets Us Know We Have to Work...
223      ST83Q02  Teacher Support - Provides Extra Help When Needed
224      ST83Q03     Teacher Support - Helps Students with Learning
225      ST83Q04  Teacher Support - Gives Opportunity to Express...
226      ST84Q01  Vignette Classroom Management - Students Frequ...
227      ST84Q02  Vignette Classroom Management - Students Are C...
228      ST84Q03  Vignette Classroom Management - Students Frequ...
229      ST85Q01             Classroom Management - Students Listen
230      ST85Q02  Classroom Management - Teacher Keeps Class Ord...
231      ST85Q03      Classroom Management - Teacher Starts On Time
232      ST85Q04   Classroom Management - Wait Long to <Quiet Down>
233      ST86Q01  Student-Teacher Relation - Get Along with Teac...
234      ST86Q02  Student-Teacher Relation - Teachers Are Intere...
235      ST86Q03  Student-Teacher Relation - Teachers Listen to ...
236      ST86Q04  Student-Teacher Relation - Teachers Help Students
237      ST86Q05  Student-Teacher Relation - Teachers Treat Stud...
238      ST87Q01            Sense of Belonging - Feel Like Outsider
239      ST87Q02           Sense of Belonging - Make Friends Easily
240      ST87Q03              Sense of Belonging - Belong at School
241      ST87Q04        Sense of Belonging - Feel Awkward at School
242      ST87Q05       Sense of Belonging - Liked by Other Students
243      ST87Q06         Sense of Belonging - Feel Lonely at School
244      ST87Q07          Sense of Belonging - Feel Happy at School
245      ST87Q08    Sense of Belonging - Things Are Ideal at School
246      ST87Q09           Sense of Belonging - Satisfied at School
247      ST88Q01  Attitude towards School - Does Little to Prepa...
248      ST88Q02            Attitude towards School - Waste of Time
249      ST88Q03       Attitude towards School - Gave Me Confidence
250      ST88Q04            Attitude towards School- Useful for Job
251      ST89Q02        Attitude toward School - Helps to Get a Job
252      ST89Q03       Attitude toward School - Prepare for College
253      ST89Q04         Attitude toward School - Enjoy Good Grades
254      ST89Q05  Attitude toward School - Trying Hard is Important
255      ST91Q01  Perceived Control - Can Succeed with Enough Ef...
256      ST91Q02  Perceived Control - My Choice Whether I Will B...
257      ST91Q03  Perceived Control - Problems Prevent from Putt...
258      ST91Q04  Perceived Control - Different Teachers Would M...
259      ST91Q05  Perceived Control - Could Perform Well if I Wa...
260      ST91Q06        Perceived Control - Perform Poor Regardless
261      ST93Q01                      Perseverance - Give up easily
262      ST93Q03          Perseverance - Put off difficult problems
263      ST93Q04                   Perseverance - Remain interested
264      ST93Q06              Perseverance - Continue to perfection
265      ST93Q07                 Perseverance - Exceed expectations
266      ST94Q05  Openness for Problem Solving - Can Handle a Lo...
267      ST94Q06  Openness for Problem Solving - Quick to Unders...
268      ST94Q09   Openness for Problem Solving - Seek Explanations
269      ST94Q10      Openness for Problem Solving - Can Link Facts
270      ST94Q14  Openness for Problem Solving - Like to Solve C...
271      ST96Q01          Problem Text Message - Press every button
272      ST96Q02                 Problem Text Message - Trace steps
273      ST96Q03                      Problem Text Message - Manual
274      ST96Q05                Problem Text Message - Ask a friend
275     ST101Q01            Problem Route Selection - Read brochure
276     ST101Q02                Problem Route Selection - Study map
277     ST101Q03      Problem Route Selection - Leave it to brother
278     ST101Q05               Problem Route Selection - Just drive
279     ST104Q01              Problem Ticket Machine - Similarities
280     ST104Q04               Problem Ticket Machine - Try buttons
281     ST104Q05              Problem Ticket Machine - Ask for help
282     ST104Q06        Problem Ticket Machine - Find ticket office
283      IC01Q01                         At Home - Desktop Computer
284      IC01Q02                          At Home - Portable laptop
285      IC01Q03                          At Home - Tablet computer
286      IC01Q04                      At Home - Internet connection
287      IC01Q05                      At Home - Video games console
288      IC01Q06                  At Home - Cell phone w/o Internet
289      IC01Q07                 At Home - Cell phone with Internet
290      IC01Q08                           At Home - Mp3/Mp4 player
291      IC01Q09                                  At Home - Printer
292      IC01Q10                       At Home - USB (memory) stick
293      IC01Q11                             At Home - Ebook reader
294      IC02Q01                       At school - Desktop Computer
295      IC02Q02                        At school - Portable laptop
296      IC02Q03                        At school - Tablet computer
297      IC02Q04                    At school - Internet connection
298      IC02Q05                                At school - Printer
299      IC02Q06                     At school - USB (memory) stick
300      IC02Q07                           At school - Ebook reader
301      IC03Q01                             First use of computers
302      IC04Q01                           First access to Internet
303      IC05Q01                                 Internet at School
304      IC06Q01                   Internet out-of-school - Weekday
305      IC07Q01                   Internet out-of-school - Weekend
306      IC08Q01                Out-of-school 8 - One player games.
307      IC08Q02            Out-of-school 8 - ColLabourative games.
308      IC08Q03                        Out-of-school 8 - Use email
309      IC08Q04                     Out-of-school 8 - Chat on line
310      IC08Q05                  Out-of-school 8 - Social networks
311      IC08Q06      Out-of-school 8 - Browse the Internet for fun
312      IC08Q07                        Out-of-school 8 - Read news
313      IC08Q08  Out-of-school 8 - Obtain practical information...
314      IC08Q09                   Out-of-school 8 - Download music
315      IC08Q11                   Out-of-school 8 - Upload content
316      IC09Q01              Out-of-school 9 - Internet for school
317      IC09Q02                   Out-of-school 9 - Email students
318      IC09Q03                   Out-of-school 9 - Email teachers
319      IC09Q04             Out-of-school 9 - Download from School
320      IC09Q05                    Out-of-school 9 - Announcements
321      IC09Q06                         Out-of-school 9 - Homework
322      IC09Q07            Out-of-school 9 - Share school material
323      IC10Q01                           At School - Chat on line
324      IC10Q02                                  At School - Email
325      IC10Q03                  At School - Browse for schoolwork
326      IC10Q04                  At School - Download from website
327      IC10Q05                        At School - Post on website
328      IC10Q06                            At School - Simulations
329      IC10Q07                  At School - Practice and drilling
330      IC10Q08                               At School - Homework
331      IC10Q09                             At School - Group work
332      IC11Q01                         Maths lessons - Draw graph
333      IC11Q02           Maths lessons - Calculation with numbers
334      IC11Q03                  Maths lessons - Geometric figures
335      IC11Q04                        Maths lessons - Spreadsheet
336      IC11Q05                            Maths lessons - Algebra
337      IC11Q06                         Maths lessons - Histograms
338      IC11Q07                   Maths lessons - Change in graphs
339      IC22Q01                  Attitudes - Useful for schoolwork
340      IC22Q02                      Attitudes - Homework more fun
341      IC22Q04                  Attitudes - Source of information
342      IC22Q06                            Attitudes - Troublesome
343      IC22Q07            Attitudes - Not suitable for schoolwork
344      IC22Q08                         Attitudes - Too unreliable
345      EC01Q01                         Miss 2 months of <ISCED 1>
346      EC02Q01                         Miss 2 months of <ISCED 2>
347      EC03Q01                    Future Orientation - Internship
348      EC03Q02              Future Orientation - Work-site visits
349      EC03Q03                      Future Orientation - Job fair
350      EC03Q04      Future Orientation - Career advisor at school
351      EC03Q05  Future Orientation - Career advisor outside sc...
352      EC03Q06                 Future Orientation - Questionnaire
353      EC03Q07               Future Orientation - Internet search
354      EC03Q08   Future Orientation - Tour<ISCED 3-5> institution
355      EC03Q09   Future Orientation - web search <ISCED 3-5> prog
356      EC03Q10       Future Orientation - <country specific item>
357     EC04Q01A   Acquired skills - Find job info - Yes, at school
358     EC04Q01B  Acquired skills - Find job info - Yes, out of ...
359     EC04Q01C        Acquired skills - Find job info - No, never
360     EC04Q02A  Acquired skills - Search for job - Yes, at school
361     EC04Q02B  Acquired skills - Search for job - Yes, out of...
362     EC04Q02C       Acquired skills - Search for job - No, never
363     EC04Q03A    Acquired skills - Write resume - Yes, at school
364     EC04Q03B  Acquired skills - Write resume - Yes, out of s...
365     EC04Q03C         Acquired skills - Write resume - No, never
366     EC04Q04A   Acquired skills - Job interview - Yes, at school
367     EC04Q04B  Acquired skills - Job interview - Yes, out of ...
368     EC04Q04C        Acquired skills - Job interview - No, never
369     EC04Q05A  Acquired skills - ISCED 3-5 programs - Yes, at...
370     EC04Q05B  Acquired skills - ISCED 3-5 programs - Yes, ou...
371     EC04Q05C   Acquired skills - ISCED 3-5 programs - No, never
372     EC04Q06A  Acquired skills - Student financing - Yes, at ...
373     EC04Q06B  Acquired skills - Student financing - Yes, out...
374     EC04Q06C    Acquired skills - Student financing - No, never
375      EC05Q01                             First language learned
376      EC06Q01               Age started learning <test language>
377      EC07Q01                           Language spoken - Mother
378      EC07Q02                           Language spoken - Father
379      EC07Q03                         Language spoken - Siblings
380      EC07Q04                      Language spoken - Best friend
381      EC07Q05                      Language spoken - Schoolmates
382      EC08Q01                      Activities language - Reading
383      EC08Q02                  Activities language - Watching TV
384      EC08Q03             Activities language - Internet surfing
385      EC08Q04               Activities language - Writing emails
386      EC09Q03  Types of support <test language> - remedial le...
387      EC10Q01                  Amount of support <test language>
388      EC11Q02       Attend lessons <heritage language> - focused
389      EC11Q03  Attend lessons <heritage language> - school su...
390      EC12Q01                 Instruction in <heritage language>
391      ST22Q01          Acculturation - Mother Immigrant (Filter)
392      ST23Q01       Acculturation - Enjoy <Host Culture> Friends
393      ST23Q02   Acculturation - Enjoy <Heritage Culture> Friends
394      ST23Q03  Acculturation - Enjoy <Host Culture> Celebrations
395      ST23Q04  Acculturation - Enjoy <Heritage Culture> Celeb...
396      ST23Q05  Acculturation - Spend Time with <Host Culture>...
397      ST23Q06  Acculturation - Spend Time with <Heritage Cult...
398      ST23Q07  Acculturation - Participate in <Host Culture> ...
399      ST23Q08  Acculturation - Participate in <Heritage Cultu...
400      ST24Q01  Acculturation - Perceived Host-Heritage Cultur...
401      ST24Q02  Acculturation - Perceived Host-Heritage Cultur...
402      ST24Q03  Acculturation - Perceived Host-Heritage Cultur...
403      CLCUSE1                                     Calculator Use
404    CLCUSE301                                      Effort-real 1
405    CLCUSE302                                      Effort-real 2
406      DEFFORT                               Difference in Effort
407      QUESTID                         Student Questionnaire Form
408       BOOKID                                         Booklet ID
409         EASY             Standard or simplified set of booklets
410          AGE                                     Age of student
411        GRADE           Grade compared to modal grade in country
412        PROGN               Unique national study programme code
413       ANXMAT                                Mathematics Anxiety
414       ATSCHL         Attitude towards School: Learning Outcomes
415     ATTLNACT       Attitude towards School: Learning Activities
416       BELONG                       Sense of Belonging to School
417        BFMJ2                                     Father SQ ISEI
418        BMMJ1                                     Mother SQ ISEI
419       CLSMAN         Mathematics Teacher's Classroom Management
420       COBN_F       Country of Birth National Categories- Father
421       COBN_M       Country of Birth National Categories- Mother
422       COBN_S         Country of Birth National Categories- Self
423       COGACT        Cognitive Activation in Mathematics Lessons
424     CULTDIST  Cultural Distance between Host and Heritage Cu...
425      CULTPOS                               Cultural Possessions
426     DISCLIMA                               Disciplinary Climate
427       ENTUSE                              ICT Entertainment Use
428         ESCS      Index of economic, social and cultural status
429      EXAPPLM  Experience with Applied Mathematics Tasks at S...
430      EXPUREM   Experience with Pure Mathematics Tasks at School
431      FAILMAT             Attributions to Failure in Mathematics
432       FAMCON             Familiarity with Mathematical Concepts
433      FAMCONC  Familiarity with Mathematical Concepts (Signal...
434     FAMSTRUC                                   Family Structure
435       FISCED                Educational level of father (ISCED)
436       HEDRES                         Home educational resources
437     HERITCUL  Acculturation: Heritage Culture Oriented  Stra...
438       HISCED               Highest educational level of parents
439        HISEI               Highest parental occupational status
440      HOMEPOS                                   Home Possessions
441       HOMSCH           ICT Use at Home for School-related Tasks
442      HOSTCUL    Acculturation: Host Culture Oriented Strategies
443    ICTATTNEG  Attitudes Towards Computers: Limitations of th...
444    ICTATTPOS  Attitudes Towards Computers: Computer as a Too...
445      ICTHOME                           ICT Availability at Home
446       ICTRES                                      ICT resources
447       ICTSCH                         ICT Availability at School
448        IMMIG                                 Immigration status
449      INFOCAR                          Information about Careers
450     INFOJOB1  Information about the Labour Market provided b...
451     INFOJOB2  Information about the Labour Market provided o...
452      INSTMOT            Instrumental Motivation for Mathematics
453       INTMAT                               Mathematics Interest
454       ISCEDD                                  ISCED designation
455       ISCEDL                                        ISCED level
456       ISCEDO                                  ISCED orientation
457     LANGCOMM  Preference for Heritage Language in Conversati...
458        LANGN                    Language at home (3-digit code)
459     LANGRPPD  Preference for Heritage Language in Language R...
460        LMINS  Learning time (minutes per week)  - <test lang...
461       MATBEH                              Mathematics Behaviour
462      MATHEFF                          Mathematics Self-Efficacy
463     MATINTFC                             Mathematics Intentions
464     MATWKETH                             Mathematics Work Ethic
465       MISCED                Educational level of mother (ISCED)
466        MMINS    Learning time (minutes per week)- <Mathematics>
467        MTSUP                      Mathematics Teacher's Support
468        OCOD1                   ISCO-08 Occupation code - Mother
469        OCOD2                   ISCO-08 Occupation code - Father
470       OPENPS                       Openness for Problem Solving
471     OUTHOURS                           Out-of-School Study Time
472        PARED                Highest parental education in years
473       PERSEV                                       Perseverance
474       REPEAT                                   Grade Repetition
475        SCMAT                           Mathematics Self-Concept
476        SMINS       Learning time (minutes per week) - <Science>
477      STUDREL                          Teacher Student Relations
478      SUBNORM                    Subjective Norms in Mathematics
479     TCHBEHFA            Teacher Behaviour: Formative Assessment
480     TCHBEHSO             Teacher Behaviour: Student Orientation
481     TCHBEHTD    Teacher Behaviour: Teacher-directed Instruction
482     TEACHSUP                                    Teacher Support
483     TESTLANG                               Language of the test
484      TIMEINT                        Time of computer use (mins)
485      USEMATH                   Use of ICT in Mathematic Lessons
486       USESCH                               Use of ICT at School
487       WEALTH                                             Wealth
488    ANCATSCHL  Attitude towards School: Learning Outcomes (An...
489  ANCATTLNACT  Attitude towards School: Learning Activities (...
490    ANCBELONG            Sense of Belonging to School (Anchored)
491    ANCCLSMAN  Mathematics Teacher's Classroom Management (An...
492    ANCCOGACT  Cognitive Activation in Mathematics Lessons (A...
493   ANCINSTMOT  Instrumental Motivation for Mathematics (Ancho...
494    ANCINTMAT                    Mathematics Interest (Anchored)
495  ANCMATWKETH                  Mathematics Work Ethic (Anchored)
496     ANCMTSUP           Mathematics Teacher's Support (Anchored)
497     ANCSCMAT                Mathematics Self-Concept (Anchored)
498   ANCSTUDREL               Teacher Student Relations (Anchored)
499   ANCSUBNORM         Subjective Norms in Mathematics (Anchored)
500      PV1MATH                   Plausible value 1 in mathematics
501      PV2MATH                   Plausible value 2 in mathematics
502      PV3MATH                   Plausible value 3 in mathematics
503      PV4MATH                   Plausible value 4 in mathematics
504      PV5MATH                   Plausible value 5 in mathematics
505      PV1MACC  Plausible value 1 in content subscale of math ...
506      PV2MACC  Plausible value 2 in content subscale of math ...
507      PV3MACC  Plausible value 3 in content subscale of math ...
508      PV4MACC  Plausible value 4 in content subscale of math ...
509      PV5MACC  Plausible value 5 in content subscale of math ...
510      PV1MACQ  Plausible value 1 in content subscale of math ...
511      PV2MACQ  Plausible value 2 in content subscale of math ...
512      PV3MACQ  Plausible value 3 in content subscale of math ...
513      PV4MACQ  Plausible value 4 in content subscale of math ...
514      PV5MACQ  Plausible value 5 in content subscale of math ...
515      PV1MACS  Plausible value 1 in content subscale of math ...
516      PV2MACS  Plausible value 2 in content subscale of math ...
517      PV3MACS  Plausible value 3 in content subscale of math ...
518      PV4MACS  Plausible value 4 in content subscale of math ...
519      PV5MACS  Plausible value 5 in content subscale of math ...
520      PV1MACU  Plausible value 1 in content subscale of math ...
521      PV2MACU  Plausible value 2 in content subscale of math ...
522      PV3MACU  Plausible value 3 in content subscale of math ...
523      PV4MACU  Plausible value 4 in content subscale of math ...
524      PV5MACU  Plausible value 5 in content subscale of math ...
525      PV1MAPE  Plausible value 1 in process subscale of math ...
526      PV2MAPE  Plausible value 2 in process subscale of math ...
527      PV3MAPE  Plausible value 3 in process subscale of math ...
528      PV4MAPE  Plausible value 4 in process subscale of math ...
529      PV5MAPE  Plausible value 5 in process subscale of math ...
530      PV1MAPF  Plausible value 1 in process subscale of math ...
531      PV2MAPF  Plausible value 2 in process subscale of math ...
532      PV3MAPF  Plausible value 3 in process subscale of math ...
533      PV4MAPF  Plausible value 4 in process subscale of math ...
534      PV5MAPF  Plausible value 5 in process subscale of math ...
535      PV1MAPI  Plausible value 1 in process subscale of math ...
536      PV2MAPI  Plausible value 2 in process subscale of math ...
537      PV3MAPI  Plausible value 3 in process subscale of math ...
538      PV4MAPI  Plausible value 4 in process subscale of math ...
539      PV5MAPI  Plausible value 5 in process subscale of math ...
540      PV1READ                       Plausible value 1 in reading
541      PV2READ                       Plausible value 2 in reading
542      PV3READ                       Plausible value 3 in reading
543      PV4READ                       Plausible value 4 in reading
544      PV5READ                       Plausible value 5 in reading
545      PV1SCIE                       Plausible value 1 in science
546      PV2SCIE                       Plausible value 2 in science
547      PV3SCIE                       Plausible value 3 in science
548      PV4SCIE                       Plausible value 4 in science
549      PV5SCIE                       Plausible value 5 in science
550     W_FSTUWT                               FINAL STUDENT WEIGHT
551      W_FSTR1            FINAL STUDENT REPLICATE BRR-FAY WEIGHT1
552      W_FSTR2            FINAL STUDENT REPLICATE BRR-FAY WEIGHT2
553      W_FSTR3            FINAL STUDENT REPLICATE BRR-FAY WEIGHT3
554      W_FSTR4            FINAL STUDENT REPLICATE BRR-FAY WEIGHT4
555      W_FSTR5            FINAL STUDENT REPLICATE BRR-FAY WEIGHT5
556      W_FSTR6            FINAL STUDENT REPLICATE BRR-FAY WEIGHT6
557      W_FSTR7            FINAL STUDENT REPLICATE BRR-FAY WEIGHT7
558      W_FSTR8            FINAL STUDENT REPLICATE BRR-FAY WEIGHT8
559      W_FSTR9            FINAL STUDENT REPLICATE BRR-FAY WEIGHT9
560     W_FSTR10           FINAL STUDENT REPLICATE BRR-FAY WEIGHT10
561     W_FSTR11           FINAL STUDENT REPLICATE BRR-FAY WEIGHT11
562     W_FSTR12           FINAL STUDENT REPLICATE BRR-FAY WEIGHT12
563     W_FSTR13           FINAL STUDENT REPLICATE BRR-FAY WEIGHT13
564     W_FSTR14           FINAL STUDENT REPLICATE BRR-FAY WEIGHT14
565     W_FSTR15           FINAL STUDENT REPLICATE BRR-FAY WEIGHT15
566     W_FSTR16           FINAL STUDENT REPLICATE BRR-FAY WEIGHT16
567     W_FSTR17           FINAL STUDENT REPLICATE BRR-FAY WEIGHT17
568     W_FSTR18           FINAL STUDENT REPLICATE BRR-FAY WEIGHT18
569     W_FSTR19           FINAL STUDENT REPLICATE BRR-FAY WEIGHT19
570     W_FSTR20           FINAL STUDENT REPLICATE BRR-FAY WEIGHT20
571     W_FSTR21           FINAL STUDENT REPLICATE BRR-FAY WEIGHT21
572     W_FSTR22           FINAL STUDENT REPLICATE BRR-FAY WEIGHT22
573     W_FSTR23           FINAL STUDENT REPLICATE BRR-FAY WEIGHT23
574     W_FSTR24           FINAL STUDENT REPLICATE BRR-FAY WEIGHT24
575     W_FSTR25           FINAL STUDENT REPLICATE BRR-FAY WEIGHT25
576     W_FSTR26           FINAL STUDENT REPLICATE BRR-FAY WEIGHT26
577     W_FSTR27           FINAL STUDENT REPLICATE BRR-FAY WEIGHT27
578     W_FSTR28           FINAL STUDENT REPLICATE BRR-FAY WEIGHT28
579     W_FSTR29           FINAL STUDENT REPLICATE BRR-FAY WEIGHT29
580     W_FSTR30           FINAL STUDENT REPLICATE BRR-FAY WEIGHT30
581     W_FSTR31           FINAL STUDENT REPLICATE BRR-FAY WEIGHT31
582     W_FSTR32           FINAL STUDENT REPLICATE BRR-FAY WEIGHT32
583     W_FSTR33           FINAL STUDENT REPLICATE BRR-FAY WEIGHT33
584     W_FSTR34           FINAL STUDENT REPLICATE BRR-FAY WEIGHT34
585     W_FSTR35           FINAL STUDENT REPLICATE BRR-FAY WEIGHT35
586     W_FSTR36           FINAL STUDENT REPLICATE BRR-FAY WEIGHT36
587     W_FSTR37           FINAL STUDENT REPLICATE BRR-FAY WEIGHT37
588     W_FSTR38           FINAL STUDENT REPLICATE BRR-FAY WEIGHT38
589     W_FSTR39           FINAL STUDENT REPLICATE BRR-FAY WEIGHT39
590     W_FSTR40           FINAL STUDENT REPLICATE BRR-FAY WEIGHT40
591     W_FSTR41           FINAL STUDENT REPLICATE BRR-FAY WEIGHT41
592     W_FSTR42           FINAL STUDENT REPLICATE BRR-FAY WEIGHT42
593     W_FSTR43           FINAL STUDENT REPLICATE BRR-FAY WEIGHT43
594     W_FSTR44           FINAL STUDENT REPLICATE BRR-FAY WEIGHT44
595     W_FSTR45           FINAL STUDENT REPLICATE BRR-FAY WEIGHT45
596     W_FSTR46           FINAL STUDENT REPLICATE BRR-FAY WEIGHT46
597     W_FSTR47           FINAL STUDENT REPLICATE BRR-FAY WEIGHT47
598     W_FSTR48           FINAL STUDENT REPLICATE BRR-FAY WEIGHT48
599     W_FSTR49           FINAL STUDENT REPLICATE BRR-FAY WEIGHT49
600     W_FSTR50           FINAL STUDENT REPLICATE BRR-FAY WEIGHT50
601     W_FSTR51           FINAL STUDENT REPLICATE BRR-FAY WEIGHT51
602     W_FSTR52           FINAL STUDENT REPLICATE BRR-FAY WEIGHT52
603     W_FSTR53           FINAL STUDENT REPLICATE BRR-FAY WEIGHT53
604     W_FSTR54           FINAL STUDENT REPLICATE BRR-FAY WEIGHT54
605     W_FSTR55           FINAL STUDENT REPLICATE BRR-FAY WEIGHT55
606     W_FSTR56           FINAL STUDENT REPLICATE BRR-FAY WEIGHT56
607     W_FSTR57           FINAL STUDENT REPLICATE BRR-FAY WEIGHT57
608     W_FSTR58           FINAL STUDENT REPLICATE BRR-FAY WEIGHT58
609     W_FSTR59           FINAL STUDENT REPLICATE BRR-FAY WEIGHT59
610     W_FSTR60           FINAL STUDENT REPLICATE BRR-FAY WEIGHT60
611     W_FSTR61           FINAL STUDENT REPLICATE BRR-FAY WEIGHT61
612     W_FSTR62           FINAL STUDENT REPLICATE BRR-FAY WEIGHT62
613     W_FSTR63           FINAL STUDENT REPLICATE BRR-FAY WEIGHT63
614     W_FSTR64           FINAL STUDENT REPLICATE BRR-FAY WEIGHT64
615     W_FSTR65           FINAL STUDENT REPLICATE BRR-FAY WEIGHT65
616     W_FSTR66           FINAL STUDENT REPLICATE BRR-FAY WEIGHT66
617     W_FSTR67           FINAL STUDENT REPLICATE BRR-FAY WEIGHT67
618     W_FSTR68           FINAL STUDENT REPLICATE BRR-FAY WEIGHT68
619     W_FSTR69           FINAL STUDENT REPLICATE BRR-FAY WEIGHT69
620     W_FSTR70           FINAL STUDENT REPLICATE BRR-FAY WEIGHT70
621     W_FSTR71           FINAL STUDENT REPLICATE BRR-FAY WEIGHT71
622     W_FSTR72           FINAL STUDENT REPLICATE BRR-FAY WEIGHT72
623     W_FSTR73           FINAL STUDENT REPLICATE BRR-FAY WEIGHT73
624     W_FSTR74           FINAL STUDENT REPLICATE BRR-FAY WEIGHT74
625     W_FSTR75           FINAL STUDENT REPLICATE BRR-FAY WEIGHT75
626     W_FSTR76           FINAL STUDENT REPLICATE BRR-FAY WEIGHT76
627     W_FSTR77           FINAL STUDENT REPLICATE BRR-FAY WEIGHT77
628     W_FSTR78           FINAL STUDENT REPLICATE BRR-FAY WEIGHT78
629     W_FSTR79           FINAL STUDENT REPLICATE BRR-FAY WEIGHT79
630     W_FSTR80           FINAL STUDENT REPLICATE BRR-FAY WEIGHT80
631     WVARSTRR           RANDOMIZED FINAL VARIANCE STRATUM (1-80)
632     VAR_UNIT                    RANDOMLY ASSIGNED VARIANCE UNIT
633   SENWGT_STU  Senate weight - sum of weight within the count...
634      VER_STU                      Date of the database creation
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[173], line 1
----> 1 print(df_pisa_dict).head()

AttributeError: 'NoneType' object has no attribute 'head'
In [174]:
df_pisa.ENTUSE.value_counts().head()                
Out[174]:
-0.0018    14701
 0.0883    13451
-0.1819    13111
-0.0919    13095
 0.1788    12810
Name: ENTUSE, dtype: int64
In [10]:
df_pisa.ST04Q01.value_counts()
Out[10]:
Female    245064
Male      240426
Name: ST04Q01, dtype: int64
In [11]:
df_pisa['AGE'].value_counts()
Out[11]:
15.58    42762
15.67    42353
15.75    41664
15.83    41402
15.92    41084
16.00    41049
15.42    40437
15.50    40291
16.08    39313
16.17    38356
15.33    28354
16.25    26139
15.25    11986
16.33    10183
15.17        1
Name: AGE, dtype: int64
In [175]:
df_pisa['CNT'].value_counts().head()
Out[175]:
Mexico    33806
Italy     31073
Spain     25313
Canada    21544
Brazil    19204
Name: CNT, dtype: int64
In [13]:
df_pisa['CNT'].nunique()
Out[13]:
68
In [14]:
pd.options.display.max_rows = len(df_pisa)
pd.options.display.max_columns = len(df_pisa.columns)
df_pisa.head()
Out[14]:
Unnamed: 0 CNT SUBNATIO STRATUM OECD NC SCHOOLID STIDSTD ST01Q01 ST02Q01 ST03Q01 ST03Q02 ST04Q01 ST05Q01 ST06Q01 ST07Q01 ST07Q02 ST07Q03 ST08Q01 ST09Q01 ST115Q01 ST11Q01 ST11Q02 ST11Q03 ST11Q04 ST11Q05 ST11Q06 ST13Q01 ST14Q01 ST14Q02 ST14Q03 ST14Q04 ST15Q01 ST17Q01 ST18Q01 ST18Q02 ST18Q03 ST18Q04 ST19Q01 ST20Q01 ST20Q02 ST20Q03 ST21Q01 ST25Q01 ST26Q01 ST26Q02 ST26Q03 ST26Q04 ST26Q05 ST26Q06 ST26Q07 ST26Q08 ST26Q09 ST26Q10 ST26Q11 ST26Q12 ST26Q13 ST26Q14 ST26Q15 ST26Q16 ST26Q17 ST27Q01 ST27Q02 ST27Q03 ST27Q04 ST27Q05 ST28Q01 ST29Q01 ST29Q02 ST29Q03 ST29Q04 ST29Q05 ST29Q06 ST29Q07 ST29Q08 ST35Q01 ST35Q02 ST35Q03 ST35Q04 ST35Q05 ST35Q06 ST37Q01 ST37Q02 ST37Q03 ST37Q04 ST37Q05 ST37Q06 ST37Q07 ST37Q08 ST42Q01 ST42Q02 ST42Q03 ST42Q04 ST42Q05 ST42Q06 ST42Q07 ST42Q08 ST42Q09 ST42Q10 ST43Q01 ST43Q02 ST43Q03 ST43Q04 ST43Q05 ST43Q06 ST44Q01 ST44Q03 ST44Q04 ST44Q05 ST44Q07 ST44Q08 ST46Q01 ST46Q02 ST46Q03 ST46Q04 ST46Q05 ST46Q06 ST46Q07 ST46Q08 ST46Q09 ST48Q01 ST48Q02 ST48Q03 ST48Q04 ST48Q05 ST49Q01 ST49Q02 ST49Q03 ST49Q04 ST49Q05 ST49Q06 ST49Q07 ST49Q09 ST53Q01 ST53Q02 ST53Q03 ST53Q04 ST55Q01 ST55Q02 ST55Q03 ST55Q04 ST57Q01 ST57Q02 ST57Q03 ST57Q04 ST57Q05 ST57Q06 ST61Q01 ST61Q02 ST61Q03 ST61Q04 ST61Q05 ST61Q06 ST61Q07 ST61Q08 ST61Q09 ST62Q01 ST62Q02 ST62Q03 ST62Q04 ST62Q06 ST62Q07 ST62Q08 ST62Q09 ST62Q10 ST62Q11 ST62Q12 ST62Q13 ST62Q15 ST62Q16 ST62Q17 ST62Q19 ST69Q01 ST69Q02 ST69Q03 ST70Q01 ST70Q02 ST70Q03 ST71Q01 ST72Q01 ST73Q01 ST73Q02 ST74Q01 ST74Q02 ST75Q01 ST75Q02 ST76Q01 ST76Q02 ST77Q01 ST77Q02 ST77Q04 ST77Q05 ST77Q06 ST79Q01 ST79Q02 ST79Q03 ST79Q04 ST79Q05 ST79Q06 ST79Q07 ST79Q08 ST79Q10 ST79Q11 ST79Q12 ST79Q15 ST79Q17 ST80Q01 ST80Q04 ST80Q05 ST80Q06 ST80Q07 ST80Q08 ST80Q09 ST80Q10 ST80Q11 ST81Q01 ST81Q02 ST81Q03 ST81Q04 ST81Q05 ST82Q01 ST82Q02 ST82Q03 ST83Q01 ST83Q02 ST83Q03 ST83Q04 ST84Q01 ST84Q02 ST84Q03 ST85Q01 ST85Q02 ST85Q03 ST85Q04 ST86Q01 ST86Q02 ST86Q03 ST86Q04 ST86Q05 ST87Q01 ST87Q02 ST87Q03 ST87Q04 ST87Q05 ST87Q06 ST87Q07 ST87Q08 ST87Q09 ST88Q01 ST88Q02 ST88Q03 ST88Q04 ST89Q02 ST89Q03 ST89Q04 ST89Q05 ST91Q01 ST91Q02 ST91Q03 ST91Q04 ST91Q05 ST91Q06 ST93Q01 ST93Q03 ST93Q04 ST93Q06 ST93Q07 ST94Q05 ST94Q06 ST94Q09 ST94Q10 ST94Q14 ST96Q01 ST96Q02 ST96Q03 ST96Q05 ST101Q01 ST101Q02 ST101Q03 ST101Q05 ST104Q01 ST104Q04 ST104Q05 ST104Q06 IC01Q01 IC01Q02 IC01Q03 IC01Q04 IC01Q05 IC01Q06 IC01Q07 IC01Q08 IC01Q09 IC01Q10 IC01Q11 IC02Q01 IC02Q02 IC02Q03 IC02Q04 IC02Q05 IC02Q06 IC02Q07 IC03Q01 IC04Q01 IC05Q01 IC06Q01 IC07Q01 IC08Q01 IC08Q02 IC08Q03 IC08Q04 IC08Q05 IC08Q06 IC08Q07 IC08Q08 IC08Q09 IC08Q11 IC09Q01 IC09Q02 IC09Q03 IC09Q04 IC09Q05 IC09Q06 IC09Q07 IC10Q01 IC10Q02 IC10Q03 IC10Q04 IC10Q05 IC10Q06 IC10Q07 IC10Q08 IC10Q09 IC11Q01 IC11Q02 IC11Q03 IC11Q04 IC11Q05 IC11Q06 IC11Q07 IC22Q01 IC22Q02 IC22Q04 IC22Q06 IC22Q07 IC22Q08 EC01Q01 EC02Q01 EC03Q01 EC03Q02 EC03Q03 EC03Q04 EC03Q05 EC03Q06 EC03Q07 EC03Q08 EC03Q09 EC03Q10 EC04Q01A EC04Q01B EC04Q01C EC04Q02A EC04Q02B EC04Q02C EC04Q03A EC04Q03B EC04Q03C EC04Q04A EC04Q04B EC04Q04C EC04Q05A EC04Q05B EC04Q05C EC04Q06A EC04Q06B EC04Q06C EC05Q01 EC06Q01 EC07Q01 EC07Q02 EC07Q03 EC07Q04 EC07Q05 EC08Q01 EC08Q02 EC08Q03 EC08Q04 EC09Q03 EC10Q01 EC11Q02 EC11Q03 EC12Q01 ST22Q01 ST23Q01 ST23Q02 ST23Q03 ST23Q04 ST23Q05 ST23Q06 ST23Q07 ST23Q08 ST24Q01 ST24Q02 ST24Q03 CLCUSE1 CLCUSE301 CLCUSE302 DEFFORT QUESTID BOOKID EASY AGE GRADE PROGN ANXMAT ATSCHL ATTLNACT BELONG BFMJ2 BMMJ1 CLSMAN COBN_F COBN_M COBN_S COGACT CULTDIST CULTPOS DISCLIMA ENTUSE ESCS EXAPPLM EXPUREM FAILMAT FAMCON FAMCONC FAMSTRUC FISCED HEDRES HERITCUL HISCED HISEI HOMEPOS HOMSCH HOSTCUL ICTATTNEG ICTATTPOS ICTHOME ICTRES ICTSCH IMMIG INFOCAR INFOJOB1 INFOJOB2 INSTMOT INTMAT ISCEDD ISCEDL ISCEDO LANGCOMM LANGN LANGRPPD LMINS MATBEH MATHEFF MATINTFC MATWKETH MISCED MMINS MTSUP OCOD1 OCOD2 OPENPS OUTHOURS PARED PERSEV REPEAT SCMAT SMINS STUDREL SUBNORM TCHBEHFA TCHBEHSO TCHBEHTD TEACHSUP TESTLANG TIMEINT USEMATH USESCH WEALTH ANCATSCHL ANCATTLNACT ANCBELONG ANCCLSMAN ANCCOGACT ANCINSTMOT ANCINTMAT ANCMATWKETH ANCMTSUP ANCSCMAT ANCSTUDREL ANCSUBNORM PV1MATH PV2MATH PV3MATH PV4MATH PV5MATH PV1MACC PV2MACC PV3MACC PV4MACC PV5MACC PV1MACQ PV2MACQ PV3MACQ PV4MACQ PV5MACQ PV1MACS PV2MACS PV3MACS PV4MACS PV5MACS PV1MACU PV2MACU PV3MACU PV4MACU PV5MACU PV1MAPE PV2MAPE PV3MAPE PV4MAPE PV5MAPE PV1MAPF PV2MAPF PV3MAPF PV4MAPF PV5MAPF PV1MAPI PV2MAPI PV3MAPI PV4MAPI PV5MAPI PV1READ PV2READ PV3READ PV4READ PV5READ PV1SCIE PV2SCIE PV3SCIE PV4SCIE PV5SCIE W_FSTUWT W_FSTR1 W_FSTR2 W_FSTR3 W_FSTR4 W_FSTR5 W_FSTR6 W_FSTR7 W_FSTR8 W_FSTR9 W_FSTR10 W_FSTR11 W_FSTR12 W_FSTR13 W_FSTR14 W_FSTR15 W_FSTR16 W_FSTR17 W_FSTR18 W_FSTR19 W_FSTR20 W_FSTR21 W_FSTR22 W_FSTR23 W_FSTR24 W_FSTR25 W_FSTR26 W_FSTR27 W_FSTR28 W_FSTR29 W_FSTR30 W_FSTR31 W_FSTR32 W_FSTR33 W_FSTR34 W_FSTR35 W_FSTR36 W_FSTR37 W_FSTR38 W_FSTR39 W_FSTR40 W_FSTR41 W_FSTR42 W_FSTR43 W_FSTR44 W_FSTR45 W_FSTR46 W_FSTR47 W_FSTR48 W_FSTR49 W_FSTR50 W_FSTR51 W_FSTR52 W_FSTR53 W_FSTR54 W_FSTR55 W_FSTR56 W_FSTR57 W_FSTR58 W_FSTR59 W_FSTR60 W_FSTR61 W_FSTR62 W_FSTR63 W_FSTR64 W_FSTR65 W_FSTR66 W_FSTR67 W_FSTR68 W_FSTR69 W_FSTR70 W_FSTR71 W_FSTR72 W_FSTR73 W_FSTR74 W_FSTR75 W_FSTR76 W_FSTR77 W_FSTR78 W_FSTR79 W_FSTR80 WVARSTRR VAR_UNIT SENWGT_STU VER_STU
0 1 Albania 80000 ALB0006 Non-OECD Albania 1 1 10 1.0 2 1996 Female No 6.0 No, never No, never No, never None None 1.0 Yes Yes Yes Yes NaN NaN <ISCED level 3A> No No No No Other (e.g. home duties, retired) <ISCED level 3A> NaN NaN NaN NaN Working part-time <for pay> Country of test Country of test Country of test NaN Language of the test Yes No Yes No No No No Yes No Yes No Yes No Yes 8002 8001 8002 Two One None None None 0-10 books Agree Strongly agree Agree Agree Agree Agree Agree Strongly agree Disagree Agree Disagree Agree Agree Agree Not at all confident Not very confident Confident Confident Confident Not at all confident Confident Very confident Agree Disagree Agree Agree Agree Agree Agree Disagree Disagree Disagree Agree Disagree Disagree Agree NaN Disagree Likely Slightly likely Likely Likely Likely Very Likely Agree Agree Agree Agree Agree Agree Agree Agree Agree Courses after school Test Language Major in college Science Study harder Test Language Maximum classes Science Pursuing a career Math Often Sometimes Sometimes Sometimes Sometimes Never or rarely Never or rarely Never or rarely NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Every Lesson Every Lesson Every Lesson Every Lesson Every Lesson Never or Hardly Ever Most Lessons Never or Hardly Ever Every Lesson Most Lessons Every Lesson Every Lesson Every Lesson Never or Hardly Ever Most Lessons Every Lesson Every Lesson Every Lesson Always or almost always Sometimes Never or rarely Always or almost always Always or almost always Always or almost always Always or almost always Often Often Never or Hardly Ever Never or Hardly Ever Never or Hardly Ever Never or Hardly Ever Never or Hardly Ever Strongly disagree Strongly disagree Strongly disagree Strongly disagree Agree Agree Agree Strongly agree Strongly agree Disagree Agree Strongly disagree Disagree Agree Agree Strongly disagree Agree Agree Disagree Agree Agree Strongly disagree Strongly agree Strongly agree Strongly disagree Agree Strongly disagree Agree Agree Strongly agree Strongly disagree Strongly disagree Agree Strongly agree Strongly agree Strongly agree Strongly agree Strongly agree Strongly agree Strongly disagree Disagree Strongly disagree Very much like me Very much like me Very much like me Somewhat like me Very much like me Somewhat like me Mostly like me Mostly like me Mostly like me Somewhat like me definitely do this definitely do this definitely do this definitely do this 4.0 2.0 1.0 1.0 1.0 2.0 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 99 99 99 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN A Simple calculator 99 99 99 StQ Form B booklet 7 Standard set of booklets 16.17 0.0 Albania: Upper secondary education 0.32 -2.31 0.5206 -1.18 76.49 79.74 -1.3771 Albania Albania Albania 0.6994 NaN -0.48 1.85 NaN NaN NaN NaN 0.6400 NaN NaN 2.0 ISCED 3A, ISCED 4 -1.29 NaN ISCED 3A, ISCED 4 NaN -2.61 NaN NaN NaN NaN NaN -3.16 NaN Native NaN NaN NaN 0.80 0.91 A ISCED level 3 General NaN Albanian NaN NaN 0.6426 -0.77 -0.7332 0.2882 ISCED 3A, ISCED 4 NaN -0.9508 Building architects Primary school teachers 0.0521 NaN 12.0 -0.3407 Did not repeat a <grade> 0.41 NaN -1.04 -0.0455 1.3625 0.9374 0.4297 1.68 Albanian NaN NaN NaN -2.92 -1.8636 -0.6779 -0.7351 -0.7808 -0.0219 -0.1562 0.0486 -0.2199 -0.5983 -0.0807 -0.5901 -0.3346 406.8469 376.4683 344.5319 321.1637 381.9209 325.8374 324.2795 279.8800 267.4170 312.5954 409.1837 388.1524 373.3525 389.7102 415.4152 351.5423 375.6894 341.4161 386.5945 426.3203 396.7207 334.4057 328.9531 339.8582 354.6580 324.2795 345.3108 381.1419 380.3630 346.8687 319.6059 345.3108 360.8895 390.4892 322.7216 290.7852 345.3108 326.6163 407.6258 367.1210 249.5762 254.3420 406.8496 175.7053 218.5981 341.7009 408.8400 348.2283 367.8105 392.9877 8.9096 13.1249 13.0829 4.5315 13.0829 13.9235 13.1249 13.1249 4.3389 4.3313 13.7954 4.5315 4.3313 13.7954 13.9235 4.3389 4.3313 4.5084 4.5084 13.7954 4.5315 13.1249 13.0829 4.5315 13.0829 13.9235 13.1249 13.1249 4.3389 4.3313 13.7954 4.5315 4.3313 13.7954 13.9235 4.3389 4.3313 4.5084 4.5084 13.7954 4.5315 4.5084 4.5315 13.0829 4.5315 4.3313 4.5084 4.5084 13.7954 13.9235 4.3389 13.0829 13.9235 4.3389 4.3313 13.7954 13.9235 13.1249 13.1249 4.3389 13.0829 4.5084 4.5315 13.0829 4.5315 4.3313 4.5084 4.5084 13.7954 13.9235 4.3389 13.0829 13.9235 4.3389 4.3313 13.7954 13.9235 13.1249 13.1249 4.3389 13.0829 19 1 0.2098 22NOV13
1 2 Albania 80000 ALB0006 Non-OECD Albania 1 2 10 1.0 2 1996 Female Yes, for more than one year 7.0 No, never No, never No, never One or two times None 1.0 Yes Yes NaN Yes NaN NaN <ISCED level 3A> Yes Yes No No Working full-time <for pay> <ISCED level 3A> No No No No Working full-time <for pay> Country of test Country of test Country of test NaN Language of the test Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes 8001 8001 8002 Three or more Three or more Three or more Two Two 201-500 books Disagree Strongly agree Disagree Disagree Agree Agree Disagree Disagree Strongly agree Strongly agree Disagree Agree Disagree Agree Confident Very confident Very confident Confident Very confident Confident Very confident Not very confident NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Strongly agree Strongly agree Strongly disagree Disagree Agree Disagree Likely Slightly likely Slightly likely Very Likely Slightly likely Likely Agree Agree Strongly agree Strongly agree Strongly agree Agree Agree Disagree Agree Courses after school Math Major in college Science Study harder Math Maximum classes Science Pursuing a career Science Sometimes Often Always or almost always Sometimes Always or almost always Never or rarely Never or rarely Often relating to known Improve understanding in my sleep Repeat examples I do not attend <out-of-school time lessons> i... 2 or more but less than 4 hours a week 2 or more but less than 4 hours a week Less than 2 hours a week NaN NaN 6.0 0.0 0.0 2.0 Rarely Rarely Frequently Sometimes Frequently Sometimes Frequently Never Frequently Know it well, understand the concept Know it well, understand the concept Heard of it once or twice Know it well, understand the concept Know it well, understand the concept Know it well, understand the concept Never heard of it Know it well, understand the concept Know it well, understand the concept Never heard of it Know it well, understand the concept Heard of it once or twice Know it well, understand the concept Know it well, understand the concept Never heard of it Heard of it often 45.0 45.0 45.0 7.0 6.0 2.0 NaN 30.0 Frequently Sometimes Frequently Frequently Sometimes Sometimes Sometimes Sometimes NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Not at all like me Not at all like me Mostly like me Somewhat like me Very much like me Somewhat like me Not much like me Not much like me Mostly like me Not much like me probably not do this probably do this probably not do this probably do this 1.0 2.0 3.0 2.0 2.0 3.0 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 99 99 99 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN A Simple calculator 99 99 99 StQ Form A booklet 9 Standard set of booklets 16.17 0.0 Albania: Upper secondary education NaN NaN NaN NaN 15.35 23.47 NaN Albania Albania Albania NaN NaN 1.27 NaN NaN NaN -0.0681 0.7955 0.1524 0.6387 -0.08 2.0 ISCED 3A, ISCED 4 1.12 NaN ISCED 5A, 6 NaN 1.41 NaN NaN NaN NaN NaN 1.15 NaN Native NaN NaN NaN -0.39 0.00 A ISCED level 3 General NaN Albanian NaN 315.0 1.4702 0.34 -0.2514 0.6490 ISCED 5A, 6 270.0 NaN Tailors, dressmakers, furriers and hatters Building construction labourers -0.9492 8.0 16.0 1.3116 Did not repeat a <grade> NaN 90.0 NaN 0.6602 NaN NaN NaN NaN Albanian NaN NaN NaN 0.69 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 486.1427 464.3325 453.4273 472.9008 476.0165 325.6816 419.9330 378.6493 359.9548 384.1019 373.1968 444.0801 456.5431 401.2385 461.2167 366.9653 459.6588 426.1645 423.0488 443.3011 389.5544 438.6275 417.5962 379.4283 438.6275 440.1854 456.5431 486.9216 458.1010 444.0801 411.3647 437.8486 457.3220 454.2063 460.4378 434.7328 448.7537 494.7110 429.2803 434.7328 406.2936 349.8975 400.7334 369.7553 396.7618 548.9929 471.5964 471.5964 443.6218 454.8116 8.9096 13.1249 13.0829 4.5315 13.0829 13.9235 13.1249 13.1249 4.3389 4.3313 13.7954 4.5315 4.3313 13.7954 13.9235 4.3389 4.3313 4.5084 4.5084 13.7954 4.5315 13.1249 13.0829 4.5315 13.0829 13.9235 13.1249 13.1249 4.3389 4.3313 13.7954 4.5315 4.3313 13.7954 13.9235 4.3389 4.3313 4.5084 4.5084 13.7954 4.5315 4.5084 4.5315 13.0829 4.5315 4.3313 4.5084 4.5084 13.7954 13.9235 4.3389 13.0829 13.9235 4.3389 4.3313 13.7954 13.9235 13.1249 13.1249 4.3389 13.0829 4.5084 4.5315 13.0829 4.5315 4.3313 4.5084 4.5084 13.7954 13.9235 4.3389 13.0829 13.9235 4.3389 4.3313 13.7954 13.9235 13.1249 13.1249 4.3389 13.0829 19 1 0.2098 22NOV13
2 3 Albania 80000 ALB0006 Non-OECD Albania 1 3 9 1.0 9 1996 Female Yes, for more than one year 6.0 No, never No, never No, never None None 1.0 Yes Yes No Yes No No <ISCED level 3B, 3C> Yes Yes Yes No Working full-time <for pay> <ISCED level 3A> Yes No Yes Yes Working full-time <for pay> Country of test Country of test Country of test NaN Language of the test Yes Yes Yes Yes No Yes Yes Yes Yes Yes No Yes No Yes 8001 8001 8001 Three or more Two Two One Two More than 500 books Agree Strongly agree Agree Agree Strongly agree Strongly agree Strongly agree Strongly agree Strongly agree Strongly agree Agree Strongly agree Strongly agree Agree Confident Very confident Very confident Confident Very confident Not very confident Very confident Confident NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Strongly agree Agree Strongly agree Strongly disagree Strongly agree Strongly disagree Likely Likely Very Likely Very Likely Very Likely Slightly likely Strongly agree Strongly agree Strongly agree Strongly agree Strongly agree Agree Strongly agree Strongly agree Strongly agree Courses after school Math Major in college Science Study harder Math Maximum classes Science Pursuing a career Science Sometimes Always or almost always Sometimes Never or rarely Always or almost always Never or rarely Never or rarely Never or rarely Most important Improve understanding learning goals more information Less than 2 hours a week 2 or more but less than 4 hours a week 4 or more but less than 6 hours a week I do not attend <out-of-school time lessons> i... NaN 6.0 6.0 7.0 2.0 3.0 Frequently Sometimes Frequently Rarely Frequently Rarely Frequently Sometimes Frequently Never heard of it Know it well, understand the concept Heard of it once or twice Know it well, understand the concept Know it well, understand the concept Know it well, understand the concept Heard of it once or twice Know it well, understand the concept Know it well, understand the concept Heard of it once or twice Know it well, understand the concept Know it well, understand the concept Know it well, understand the concept Know it well, understand the concept Know it well, understand the concept Know it well, understand the concept 60.0 NaN NaN 5.0 4.0 2.0 24.0 30.0 Frequently Frequently Frequently Frequently Frequently Frequently Rarely Rarely NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Not much like me Not much like me Very much like me Very much like me Somewhat like me Mostly like me Mostly like me Very much like me Mostly like me Very much like me probably not do this definitely do this definitely not do this probably do this 1.0 3.0 4.0 1.0 3.0 4.0 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 99 99 99 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN A Simple calculator 99 99 99 StQ Form A booklet 3 Standard set of booklets 15.58 -1.0 Albania: Lower secondary education NaN NaN NaN NaN 22.57 NaN NaN Albania Albania Albania NaN NaN 1.27 NaN NaN NaN 0.5359 0.7955 1.2219 0.8215 -0.89 2.0 ISCED 5A, 6 -0.69 NaN ISCED 5A, 6 NaN 0.14 NaN NaN NaN NaN NaN -0.40 NaN Native NaN NaN NaN 1.59 1.23 A ISCED level 2 General NaN Albanian NaN 300.0 0.9618 0.34 -0.2514 2.0389 ISCED 5A, 6 NaN NaN Housewife Bricklayers and related workers 0.9383 24.0 16.0 0.9918 Did not repeat a <grade> NaN NaN NaN 2.2350 NaN NaN NaN NaN Albanian NaN NaN NaN -0.23 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 533.2684 481.0796 489.6479 490.4269 533.2684 611.1622 486.5322 567.5417 541.0578 544.9525 597.1413 495.1005 576.8889 507.5635 556.6365 594.8045 473.2902 554.2997 537.1631 568.3206 471.7324 431.2276 460.8272 419.5435 456.9325 559.7523 501.3320 555.0787 467.0587 506.7845 580.7836 481.0796 555.0787 453.8168 491.2058 527.0369 444.4695 516.1318 403.9648 476.4060 401.2100 404.3872 387.7067 431.3938 401.2100 499.6643 428.7952 492.2044 512.7191 499.6643 8.4871 12.7307 12.7307 4.2436 12.7307 12.7307 12.7307 12.7307 4.2436 4.2436 12.7307 4.2436 4.2436 12.7307 12.7307 4.2436 4.2436 4.2436 4.2436 12.7307 4.2436 12.7307 12.7307 4.2436 12.7307 12.7307 12.7307 12.7307 4.2436 4.2436 12.7307 4.2436 4.2436 12.7307 12.7307 4.2436 4.2436 4.2436 4.2436 12.7307 4.2436 4.2436 4.2436 12.7307 4.2436 4.2436 4.2436 4.2436 12.7307 12.7307 4.2436 12.7307 12.7307 4.2436 4.2436 12.7307 12.7307 12.7307 12.7307 4.2436 12.7307 4.2436 4.2436 12.7307 4.2436 4.2436 4.2436 4.2436 12.7307 12.7307 4.2436 12.7307 12.7307 4.2436 4.2436 12.7307 12.7307 12.7307 12.7307 4.2436 12.7307 19 1 0.1999 22NOV13
3 4 Albania 80000 ALB0006 Non-OECD Albania 1 4 9 1.0 8 1996 Female Yes, for more than one year 6.0 No, never No, never No, never None None 1.0 Yes Yes No Yes No No <ISCED level 3B, 3C> No No No No Working full-time <for pay> <ISCED level 3A> Yes Yes No No Working full-time <for pay> Country of test Country of test Country of test NaN Language of the test Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes Yes No 8001 8001 8002 Three or more Two One None One 11-25 books NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Strongly agree Disagree Agree Agree Disagree Strongly agree Disagree Agree Agree NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN relating to known new ways learning goals more information I do not attend <out-of-school time lessons> i... I do not attend <out-of-school time lessons> i... Less than 2 hours a week I do not attend <out-of-school time lessons> i... 10.0 2.0 2.0 0.0 0.0 3.0 Sometimes Sometimes Sometimes Sometimes Frequently Sometimes Frequently Rarely Frequently Heard of it often Heard of it often Heard of it often Know it well, understand the concept Heard of it a few times Know it well, understand the concept Never heard of it Know it well, understand the concept Know it well, understand the concept Never heard of it Know it well, understand the concept Never heard of it Know it well, understand the concept Know it well, understand the concept Heard of it often Heard of it often 45.0 45.0 45.0 3.0 3.0 2.0 NaN 28.0 Frequently Sometimes Frequently Rarely Frequently Frequently Sometimes Sometimes Every Lesson Every Lesson Every Lesson Every Lesson Every Lesson NaN Every Lesson Every Lesson Every Lesson Every Lesson Every Lesson Every Lesson Every Lesson Every Lesson Never or Hardly Ever Most Lessons Every Lesson Every Lesson Always or almost always NaN NaN NaN NaN NaN NaN NaN Never or rarely Never or Hardly Ever Never or Hardly Ever Never or Hardly Ever Never or Hardly Ever NaN NaN NaN Strongly agree Strongly agree Strongly agree Strongly agree NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 99 99 99 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 99 99 99 StQ Form C booklet 2 Standard set of booklets 15.67 -1.0 Albania: Lower secondary education 0.31 NaN NaN NaN 14.21 NaN NaN Albania Albania Albania -0.3788 NaN 1.27 1.80 NaN NaN 0.3220 0.7955 NaN 0.7266 0.24 2.0 ISCED 5A, 6 0.04 NaN ISCED 5A, 6 NaN -0.73 NaN NaN NaN NaN NaN -0.40 NaN Native NaN NaN NaN NaN NaN A ISCED level 2 General NaN Albanian NaN 135.0 NaN NaN NaN NaN ISCED 3B, C 135.0 1.6748 Housewife Cleaners and helpers in offices, hotels and ot... NaN 17.0 16.0 NaN Did not repeat a <grade> 0.18 90.0 NaN NaN 0.7644 3.3108 2.3916 1.68 Albanian NaN NaN NaN -1.17 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 412.2215 498.6836 415.3373 466.7472 454.2842 538.4094 511.9255 553.9882 483.8838 479.2102 525.1675 529.0622 539.1883 516.5992 501.7993 658.3658 567.2301 669.2709 652.1343 645.1239 508.0308 522.0517 524.3885 495.5678 458.1788 524.3885 462.0735 494.0100 459.7367 471.4208 534.5147 455.8420 504.1362 454.2842 483.8838 521.2728 481.5470 503.3572 469.8629 478.4312 547.3630 481.4353 461.5776 425.0393 471.9036 438.6796 481.5740 448.9370 474.1141 426.5573 8.4871 12.7307 12.7307 4.2436 12.7307 12.7307 12.7307 12.7307 4.2436 4.2436 12.7307 4.2436 4.2436 12.7307 12.7307 4.2436 4.2436 4.2436 4.2436 12.7307 4.2436 12.7307 12.7307 4.2436 12.7307 12.7307 12.7307 12.7307 4.2436 4.2436 12.7307 4.2436 4.2436 12.7307 12.7307 4.2436 4.2436 4.2436 4.2436 12.7307 4.2436 4.2436 4.2436 12.7307 4.2436 4.2436 4.2436 4.2436 12.7307 12.7307 4.2436 12.7307 12.7307 4.2436 4.2436 12.7307 12.7307 12.7307 12.7307 4.2436 12.7307 4.2436 4.2436 12.7307 4.2436 4.2436 4.2436 4.2436 12.7307 12.7307 4.2436 12.7307 12.7307 4.2436 4.2436 12.7307 12.7307 12.7307 12.7307 4.2436 12.7307 19 1 0.1999 22NOV13
4 5 Albania 80000 ALB0006 Non-OECD Albania 1 5 9 1.0 10 1996 Female Yes, for more than one year 6.0 No, never No, never No, never One or two times None 2.0 Yes Yes Yes NaN NaN NaN She did not complete <ISCED level 1> No No No No Working part-time <for pay> <ISCED level 3B, 3C> No No No Yes Working part-time <for pay> Country of test Country of test Country of test NaN Language of the test Yes Yes No Yes Yes Yes Yes Yes Yes Yes No Yes Yes Yes 8001 8002 8001 Two One Two None One 101-200 books Disagree Strongly agree Disagree Disagree Strongly agree Strongly agree Strongly agree Strongly agree Strongly agree Strongly agree Strongly agree Strongly agree Strongly agree Agree Confident Very confident NaN Very confident Very confident Confident Very confident Not very confident Strongly agree Strongly agree Agree Strongly agree Strongly agree Disagree Disagree Disagree Agree Agree Strongly agree Strongly agree Disagree Disagree Strongly agree Disagree Likely Likely Likely Likely Slightly likely Very Likely Strongly agree Strongly agree Agree Strongly agree Strongly agree Agree Strongly agree Strongly agree Strongly agree Courses after school Test Language Major in college Math Study harder Math Maximum classes Math Pursuing a career Math Always or almost always Always or almost always Often Often Sometimes NaN Sometimes Sometimes NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Every Lesson Most Lessons Every Lesson Most Lessons Some Lessons Some Lessons Some Lessons Most Lessons Some Lessons Most Lessons Every Lesson Most Lessons Every Lesson Some Lessons Most Lessons Most Lessons Every Lesson Most Lessons Always or almost always Often Sometimes Often Often Often Always or almost always Often Often Some Lessons Some Lessons NaN Most Lessons Never or Hardly Ever Strongly disagree Disagree Strongly agree Strongly agree Agree Strongly agree Agree Disagree Strongly agree Disagree Agree Agree Strongly agree Agree Agree Agree Agree Agree Agree Strongly disagree Strongly agree Strongly agree Strongly disagree Strongly agree Strongly disagree Strongly agree Strongly agree Strongly agree Disagree Strongly disagree Strongly agree Strongly agree Strongly agree Strongly agree Strongly agree Strongly agree Agree Strongly agree Strongly disagree Agree Strongly agree Disagree NaN Mostly like me Very much like me Very much like me Very much like me Very much like me Very much like me Mostly like me Very much like me Mostly like me definitely do this definitely do this definitely do this definitely do this 1.0 2.0 1.0 2.0 1.0 2.0 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 99 99 99 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 99 99 99 StQ Form B booklet 4 Standard set of booklets 15.50 -1.0 Albania: Lower secondary education 1.02 1.38 1.2115 2.63 80.92 NaN -0.0784 Albania Albania Albania 0.5403 NaN 1.27 -0.08 NaN NaN NaN NaN 0.6400 NaN NaN 2.0 ISCED 3A, ISCED 4 -0.69 NaN ISCED 3A, ISCED 4 NaN -0.57 NaN NaN NaN NaN NaN 0.24 NaN Native NaN NaN NaN 1.59 0.30 A ISCED level 2 General NaN Albanian NaN NaN 1.8169 0.41 0.6584 1.6881 None NaN 0.6709 Housewife Economists 1.2387 NaN 12.0 1.0819 Did not repeat a <grade> -0.06 NaN -0.02 2.8039 0.7644 0.9374 0.4297 0.11 Albanian NaN NaN NaN -1.17 0.6517 0.4908 0.8675 0.0505 0.4940 0.9986 0.0486 0.9341 0.4052 0.0358 0.2492 1.2260 381.9209 328.1742 403.7311 418.5309 395.1628 373.3525 293.1220 364.0053 430.2150 403.7311 414.6362 385.8155 392.8260 448.9095 474.6144 417.7520 353.1002 424.7624 457.4778 459.0357 339.0793 309.4797 340.6372 369.4579 384.2577 373.3525 392.0470 347.6476 342.1950 342.1950 432.5518 431.7729 399.0575 369.4579 341.4161 297.0167 353.8791 347.6476 314.1533 311.0375 311.7707 141.7883 293.5015 272.8495 260.1405 361.5628 275.7740 372.7527 403.5248 422.1746 8.4871 12.7307 12.7307 4.2436 12.7307 12.7307 12.7307 12.7307 4.2436 4.2436 12.7307 4.2436 4.2436 12.7307 12.7307 4.2436 4.2436 4.2436 4.2436 12.7307 4.2436 12.7307 12.7307 4.2436 12.7307 12.7307 12.7307 12.7307 4.2436 4.2436 12.7307 4.2436 4.2436 12.7307 12.7307 4.2436 4.2436 4.2436 4.2436 12.7307 4.2436 4.2436 4.2436 12.7307 4.2436 4.2436 4.2436 4.2436 12.7307 12.7307 4.2436 12.7307 12.7307 4.2436 4.2436 12.7307 12.7307 12.7307 12.7307 4.2436 12.7307 4.2436 4.2436 12.7307 4.2436 4.2436 4.2436 4.2436 12.7307 12.7307 4.2436 12.7307 12.7307 4.2436 4.2436 12.7307 12.7307 12.7307 12.7307 4.2436 12.7307 19 1 0.1999 22NOV13
In [184]:
df_pisa.info(verbose = True, show_counts = True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 485490 entries, 0 to 485489
Data columns (total 636 columns):
 #    Column       Non-Null Count   Dtype  
---   ------       --------------   -----  
 0    Unnamed: 0   485490 non-null  int64  
 1    CNT          485490 non-null  object 
 2    SUBNATIO     485490 non-null  int64  
 3    STRATUM      485490 non-null  object 
 4    OECD         485490 non-null  object 
 5    NC           485490 non-null  object 
 6    SCHOOLID     485490 non-null  int64  
 7    STIDSTD      485490 non-null  int64  
 8    ST01Q01      485490 non-null  int64  
 9    ST02Q01      485438 non-null  float64
 10   ST03Q01      485490 non-null  int64  
 11   ST03Q02      485490 non-null  int64  
 12   ST04Q01      485490 non-null  object 
 13   ST05Q01      476166 non-null  object 
 14   ST06Q01      457994 non-null  float64
 15   ST07Q01      436690 non-null  object 
 16   ST07Q02      431278 non-null  object 
 17   ST07Q03      305687 non-null  object 
 18   ST08Q01      479143 non-null  object 
 19   ST09Q01      479131 non-null  object 
 20   ST115Q01     479269 non-null  float64
 21   ST11Q01      460559 non-null  object 
 22   ST11Q02      441036 non-null  object 
 23   ST11Q03      400076 non-null  object 
 24   ST11Q04      390768 non-null  object 
 25   ST11Q05      348180 non-null  object 
 26   ST11Q06      337638 non-null  object 
 27   ST13Q01      457979 non-null  object 
 28   ST14Q01      390481 non-null  object 
 29   ST14Q02      407641 non-null  object 
 30   ST14Q03      382441 non-null  object 
 31   ST14Q04      304215 non-null  object 
 32   ST15Q01      467751 non-null  object 
 33   ST17Q01      443261 non-null  object 
 34   ST18Q01      371415 non-null  object 
 35   ST18Q02      387796 non-null  object 
 36   ST18Q03      362834 non-null  object 
 37   ST18Q04      292093 non-null  object 
 38   ST19Q01      451410 non-null  object 
 39   ST20Q01      476363 non-null  object 
 40   ST20Q02      472518 non-null  object 
 41   ST20Q03      469141 non-null  object 
 42   ST21Q01      32728 non-null   float64
 43   ST25Q01      465496 non-null  object 
 44   ST26Q01      473079 non-null  object 
 45   ST26Q02      469693 non-null  object 
 46   ST26Q03      472020 non-null  object 
 47   ST26Q04      473877 non-null  object 
 48   ST26Q05      463178 non-null  object 
 49   ST26Q06      473182 non-null  object 
 50   ST26Q07      465860 non-null  object 
 51   ST26Q08      467094 non-null  object 
 52   ST26Q09      467249 non-null  object 
 53   ST26Q10      471242 non-null  object 
 54   ST26Q11      463566 non-null  object 
 55   ST26Q12      474039 non-null  object 
 56   ST26Q13      469115 non-null  object 
 57   ST26Q14      474076 non-null  object 
 58   ST26Q15      485490 non-null  int64  
 59   ST26Q16      485490 non-null  int64  
 60   ST26Q17      485490 non-null  int64  
 61   ST27Q01      477079 non-null  object 
 62   ST27Q02      476548 non-null  object 
 63   ST27Q03      473459 non-null  object 
 64   ST27Q04      472499 non-null  object 
 65   ST27Q05      469643 non-null  object 
 66   ST28Q01      473765 non-null  object 
 67   ST29Q01      315911 non-null  object 
 68   ST29Q02      315473 non-null  object 
 69   ST29Q03      314928 non-null  object 
 70   ST29Q04      314737 non-null  object 
 71   ST29Q05      315231 non-null  object 
 72   ST29Q06      314746 non-null  object 
 73   ST29Q07      315066 non-null  object 
 74   ST29Q08      315232 non-null  object 
 75   ST35Q01      315860 non-null  object 
 76   ST35Q02      315315 non-null  object 
 77   ST35Q03      314873 non-null  object 
 78   ST35Q04      315160 non-null  object 
 79   ST35Q05      314843 non-null  object 
 80   ST35Q06      313389 non-null  object 
 81   ST37Q01      314644 non-null  object 
 82   ST37Q02      314624 non-null  object 
 83   ST37Q03      313883 non-null  object 
 84   ST37Q04      313416 non-null  object 
 85   ST37Q05      313970 non-null  object 
 86   ST37Q06      313678 non-null  object 
 87   ST37Q07      314070 non-null  object 
 88   ST37Q08      314112 non-null  object 
 89   ST42Q01      313855 non-null  object 
 90   ST42Q02      313502 non-null  object 
 91   ST42Q03      312176 non-null  object 
 92   ST42Q04      311980 non-null  object 
 93   ST42Q05      312624 non-null  object 
 94   ST42Q06      312327 non-null  object 
 95   ST42Q07      312583 non-null  object 
 96   ST42Q08      312456 non-null  object 
 97   ST42Q09      312223 non-null  object 
 98   ST42Q10      312853 non-null  object 
 99   ST43Q01      314971 non-null  object 
 100  ST43Q02      314182 non-null  object 
 101  ST43Q03      313494 non-null  object 
 102  ST43Q04      313420 non-null  object 
 103  ST43Q05      313228 non-null  object 
 104  ST43Q06      313470 non-null  object 
 105  ST44Q01      314119 non-null  object 
 106  ST44Q03      313405 non-null  object 
 107  ST44Q04      312645 non-null  object 
 108  ST44Q05      312996 non-null  object 
 109  ST44Q07      312970 non-null  object 
 110  ST44Q08      313374 non-null  object 
 111  ST46Q01      313898 non-null  object 
 112  ST46Q02      313567 non-null  object 
 113  ST46Q03      312994 non-null  object 
 114  ST46Q04      312997 non-null  object 
 115  ST46Q05      313043 non-null  object 
 116  ST46Q06      312900 non-null  object 
 117  ST46Q07      312854 non-null  object 
 118  ST46Q08      312989 non-null  object 
 119  ST46Q09      313040 non-null  object 
 120  ST48Q01      294410 non-null  object 
 121  ST48Q02      289827 non-null  object 
 122  ST48Q03      298479 non-null  object 
 123  ST48Q04      267716 non-null  object 
 124  ST48Q05      287992 non-null  object 
 125  ST49Q01      313495 non-null  object 
 126  ST49Q02      313025 non-null  object 
 127  ST49Q03      312168 non-null  object 
 128  ST49Q04      312378 non-null  object 
 129  ST49Q05      312582 non-null  object 
 130  ST49Q06      312571 non-null  object 
 131  ST49Q07      312425 non-null  object 
 132  ST49Q09      312752 non-null  object 
 133  ST53Q01      309947 non-null  object 
 134  ST53Q02      309880 non-null  object 
 135  ST53Q03      309272 non-null  object 
 136  ST53Q04      308931 non-null  object 
 137  ST55Q01      307761 non-null  object 
 138  ST55Q02      308171 non-null  object 
 139  ST55Q03      306090 non-null  object 
 140  ST55Q04      304130 non-null  object 
 141  ST57Q01      301367 non-null  float64
 142  ST57Q02      269808 non-null  float64
 143  ST57Q03      283813 non-null  float64
 144  ST57Q04      279657 non-null  float64
 145  ST57Q05      289502 non-null  float64
 146  ST57Q06      289428 non-null  float64
 147  ST61Q01      312799 non-null  object 
 148  ST61Q02      312284 non-null  object 
 149  ST61Q03      311616 non-null  object 
 150  ST61Q04      310304 non-null  object 
 151  ST61Q05      311698 non-null  object 
 152  ST61Q06      311376 non-null  object 
 153  ST61Q07      311797 non-null  object 
 154  ST61Q08      311498 non-null  object 
 155  ST61Q09      309084 non-null  object 
 156  ST62Q01      306484 non-null  object 
 157  ST62Q02      307481 non-null  object 
 158  ST62Q03      306602 non-null  object 
 159  ST62Q04      306319 non-null  object 
 160  ST62Q06      306733 non-null  object 
 161  ST62Q07      306627 non-null  object 
 162  ST62Q08      306640 non-null  object 
 163  ST62Q09      307479 non-null  object 
 164  ST62Q10      306316 non-null  object 
 165  ST62Q11      305550 non-null  object 
 166  ST62Q12      306327 non-null  object 
 167  ST62Q13      306158 non-null  object 
 168  ST62Q15      306297 non-null  object 
 169  ST62Q16      306406 non-null  object 
 170  ST62Q17      306784 non-null  object 
 171  ST62Q19      307729 non-null  object 
 172  ST69Q01      299618 non-null  float64
 173  ST69Q02      298601 non-null  float64
 174  ST69Q03      291943 non-null  float64
 175  ST70Q01      296878 non-null  float64
 176  ST70Q02      298339 non-null  float64
 177  ST70Q03      289068 non-null  float64
 178  ST71Q01      255665 non-null  float64
 179  ST72Q01      294163 non-null  float64
 180  ST73Q01      309601 non-null  object 
 181  ST73Q02      308965 non-null  object 
 182  ST74Q01      309845 non-null  object 
 183  ST74Q02      309303 non-null  object 
 184  ST75Q01      309289 non-null  object 
 185  ST75Q02      308663 non-null  object 
 186  ST76Q01      308980 non-null  object 
 187  ST76Q02      308489 non-null  object 
 188  ST77Q01      315248 non-null  object 
 189  ST77Q02      314913 non-null  object 
 190  ST77Q04      314368 non-null  object 
 191  ST77Q05      314827 non-null  object 
 192  ST77Q06      314807 non-null  object 
 193  ST79Q01      314909 non-null  object 
 194  ST79Q02      314328 non-null  object 
 195  ST79Q03      313955 non-null  object 
 196  ST79Q04      313906 non-null  object 
 197  ST79Q05      313637 non-null  object 
 198  ST79Q06      313875 non-null  object 
 199  ST79Q07      314093 non-null  object 
 200  ST79Q08      314201 non-null  object 
 201  ST79Q10      313979 non-null  object 
 202  ST79Q11      313782 non-null  object 
 203  ST79Q12      313472 non-null  object 
 204  ST79Q15      313846 non-null  object 
 205  ST79Q17      314039 non-null  object 
 206  ST80Q01      314171 non-null  object 
 207  ST80Q04      313521 non-null  object 
 208  ST80Q05      312593 non-null  object 
 209  ST80Q06      312490 non-null  object 
 210  ST80Q07      312521 non-null  object 
 211  ST80Q08      312591 non-null  object 
 212  ST80Q09      311814 non-null  object 
 213  ST80Q10      312305 non-null  object 
 214  ST80Q11      312865 non-null  object 
 215  ST81Q01      313982 non-null  object 
 216  ST81Q02      313546 non-null  object 
 217  ST81Q03      312716 non-null  object 
 218  ST81Q04      312994 non-null  object 
 219  ST81Q05      313436 non-null  object 
 220  ST82Q01      311690 non-null  object 
 221  ST82Q02      311243 non-null  object 
 222  ST82Q03      310986 non-null  object 
 223  ST83Q01      313505 non-null  object 
 224  ST83Q02      313112 non-null  object 
 225  ST83Q03      312943 non-null  object 
 226  ST83Q04      312945 non-null  object 
 227  ST84Q01      310981 non-null  object 
 228  ST84Q02      311399 non-null  object 
 229  ST84Q03      310713 non-null  object 
 230  ST85Q01      312474 non-null  object 
 231  ST85Q02      312120 non-null  object 
 232  ST85Q03      311832 non-null  object 
 233  ST85Q04      311727 non-null  object 
 234  ST86Q01      313223 non-null  object 
 235  ST86Q02      312591 non-null  object 
 236  ST86Q03      312188 non-null  object 
 237  ST86Q04      312294 non-null  object 
 238  ST86Q05      311904 non-null  object 
 239  ST87Q01      311776 non-null  object 
 240  ST87Q02      312138 non-null  object 
 241  ST87Q03      310821 non-null  object 
 242  ST87Q04      310998 non-null  object 
 243  ST87Q05      310587 non-null  object 
 244  ST87Q06      310952 non-null  object 
 245  ST87Q07      310281 non-null  object 
 246  ST87Q08      310735 non-null  object 
 247  ST87Q09      311101 non-null  object 
 248  ST88Q01      311250 non-null  object 
 249  ST88Q02      310964 non-null  object 
 250  ST88Q03      310980 non-null  object 
 251  ST88Q04      311371 non-null  object 
 252  ST89Q02      311522 non-null  object 
 253  ST89Q03      311233 non-null  object 
 254  ST89Q04      311243 non-null  object 
 255  ST89Q05      311138 non-null  object 
 256  ST91Q01      311430 non-null  object 
 257  ST91Q02      310396 non-null  object 
 258  ST91Q03      309826 non-null  object 
 259  ST91Q04      309398 non-null  object 
 260  ST91Q05      309610 non-null  object 
 261  ST91Q06      309656 non-null  object 
 262  ST93Q01      312856 non-null  object 
 263  ST93Q03      312140 non-null  object 
 264  ST93Q04      311311 non-null  object 
 265  ST93Q06      312270 non-null  object 
 266  ST93Q07      312259 non-null  object 
 267  ST94Q05      312404 non-null  object 
 268  ST94Q06      312185 non-null  object 
 269  ST94Q09      311413 non-null  object 
 270  ST94Q10      311747 non-null  object 
 271  ST94Q14      312001 non-null  object 
 272  ST96Q01      311381 non-null  object 
 273  ST96Q02      311460 non-null  object 
 274  ST96Q03      311078 non-null  object 
 275  ST96Q05      311319 non-null  object 
 276  ST101Q01     311290 non-null  float64
 277  ST101Q02     310906 non-null  float64
 278  ST101Q03     310321 non-null  float64
 279  ST101Q05     310655 non-null  float64
 280  ST104Q01     310449 non-null  float64
 281  ST104Q04     309969 non-null  float64
 282  ST104Q05     310366 non-null  float64
 283  ST104Q06     310156 non-null  float64
 284  IC01Q01      296977 non-null  object 
 285  IC01Q02      297068 non-null  object 
 286  IC01Q03      295602 non-null  object 
 287  IC01Q04      297305 non-null  object 
 288  IC01Q05      296587 non-null  object 
 289  IC01Q06      294773 non-null  object 
 290  IC01Q07      296116 non-null  object 
 291  IC01Q08      297109 non-null  object 
 292  IC01Q09      296855 non-null  object 
 293  IC01Q10      297451 non-null  object 
 294  IC01Q11      295118 non-null  object 
 295  IC02Q01      296975 non-null  object 
 296  IC02Q02      295618 non-null  object 
 297  IC02Q03      294625 non-null  object 
 298  IC02Q04      296944 non-null  object 
 299  IC02Q05      296167 non-null  object 
 300  IC02Q06      295830 non-null  object 
 301  IC02Q07      294249 non-null  object 
 302  IC03Q01      293216 non-null  object 
 303  IC04Q01      296305 non-null  object 
 304  IC05Q01      485490 non-null  int64  
 305  IC06Q01      485490 non-null  int64  
 306  IC07Q01      485490 non-null  int64  
 307  IC08Q01      294123 non-null  object 
 308  IC08Q02      293646 non-null  object 
 309  IC08Q03      293162 non-null  object 
 310  IC08Q04      293249 non-null  object 
 311  IC08Q05      293822 non-null  object 
 312  IC08Q06      293744 non-null  object 
 313  IC08Q07      293570 non-null  object 
 314  IC08Q08      293053 non-null  object 
 315  IC08Q09      293496 non-null  object 
 316  IC08Q11      293431 non-null  object 
 317  IC09Q01      292880 non-null  object 
 318  IC09Q02      292463 non-null  object 
 319  IC09Q03      291964 non-null  object 
 320  IC09Q04      292024 non-null  object 
 321  IC09Q05      291721 non-null  object 
 322  IC09Q06      291982 non-null  object 
 323  IC09Q07      292051 non-null  object 
 324  IC10Q01      291811 non-null  object 
 325  IC10Q02      291025 non-null  object 
 326  IC10Q03      290262 non-null  object 
 327  IC10Q04      290907 non-null  object 
 328  IC10Q05      291025 non-null  object 
 329  IC10Q06      290268 non-null  object 
 330  IC10Q07      290565 non-null  object 
 331  IC10Q08      290770 non-null  object 
 332  IC10Q09      290815 non-null  object 
 333  IC11Q01      289894 non-null  object 
 334  IC11Q02      289427 non-null  object 
 335  IC11Q03      288868 non-null  object 
 336  IC11Q04      289085 non-null  object 
 337  IC11Q05      289082 non-null  object 
 338  IC11Q06      289269 non-null  object 
 339  IC11Q07      289277 non-null  object 
 340  IC22Q01      290487 non-null  object 
 341  IC22Q02      290000 non-null  object 
 342  IC22Q04      289671 non-null  object 
 343  IC22Q06      289239 non-null  object 
 344  IC22Q07      288871 non-null  object 
 345  IC22Q08      288959 non-null  object 
 346  EC01Q01      156043 non-null  object 
 347  EC02Q01      156008 non-null  object 
 348  EC03Q01      160533 non-null  object 
 349  EC03Q02      165310 non-null  object 
 350  EC03Q03      165197 non-null  object 
 351  EC03Q04      165164 non-null  object 
 352  EC03Q05      164984 non-null  object 
 353  EC03Q06      164935 non-null  object 
 354  EC03Q07      165136 non-null  object 
 355  EC03Q08      164921 non-null  object 
 356  EC03Q09      164992 non-null  object 
 357  EC03Q10      132539 non-null  object 
 358  EC04Q01A     169730 non-null  float64
 359  EC04Q01B     169765 non-null  float64
 360  EC04Q01C     169779 non-null  float64
 361  EC04Q02A     169783 non-null  float64
 362  EC04Q02B     169784 non-null  float64
 363  EC04Q02C     169798 non-null  float64
 364  EC04Q03A     169796 non-null  float64
 365  EC04Q03B     169786 non-null  float64
 366  EC04Q03C     169799 non-null  float64
 367  EC04Q04A     169655 non-null  float64
 368  EC04Q04B     169641 non-null  float64
 369  EC04Q04C     169656 non-null  float64
 370  EC04Q05A     169716 non-null  float64
 371  EC04Q05B     169716 non-null  float64
 372  EC04Q05C     169725 non-null  float64
 373  EC04Q06A     169643 non-null  float64
 374  EC04Q06B     169640 non-null  float64
 375  EC04Q06C     169636 non-null  float64
 376  EC05Q01      129658 non-null  object 
 377  EC06Q01      40345 non-null   object 
 378  EC07Q01      44012 non-null   object 
 379  EC07Q02      43219 non-null   object 
 380  EC07Q03      42277 non-null   object 
 381  EC07Q04      42832 non-null   object 
 382  EC07Q05      42864 non-null   object 
 383  EC08Q01      43633 non-null   object 
 384  EC08Q02      43393 non-null   object 
 385  EC08Q03      43489 non-null   object 
 386  EC08Q04      43330 non-null   object 
 387  EC09Q03      118588 non-null  object 
 388  EC10Q01      43293 non-null   object 
 389  EC11Q02      118637 non-null  object 
 390  EC11Q03      118659 non-null  object 
 391  EC12Q01      42909 non-null   object 
 392  ST22Q01      40721 non-null   object 
 393  ST23Q01      13730 non-null   object 
 394  ST23Q02      13512 non-null   object 
 395  ST23Q03      13497 non-null   object 
 396  ST23Q04      13411 non-null   object 
 397  ST23Q05      13450 non-null   object 
 398  ST23Q06      13373 non-null   object 
 399  ST23Q07      13411 non-null   object 
 400  ST23Q08      13382 non-null   object 
 401  ST24Q01      13457 non-null   object 
 402  ST24Q02      13351 non-null   object 
 403  ST24Q03      13281 non-null   object 
 404  CLCUSE1      412337 non-null  object 
 405  CLCUSE301    485490 non-null  int64  
 406  CLCUSE302    485490 non-null  int64  
 407  DEFFORT      485490 non-null  int64  
 408  QUESTID      485490 non-null  object 
 409  BOOKID       485490 non-null  object 
 410  EASY         485490 non-null  object 
 411  AGE          485374 non-null  float64
 412  GRADE        484617 non-null  float64
 413  PROGN        485490 non-null  object 
 414  ANXMAT       314764 non-null  float64
 415  ATSCHL       312584 non-null  float64
 416  ATTLNACT     311675 non-null  float64
 417  BELONG       313399 non-null  float64
 418  BFMJ2        416150 non-null  float64
 419  BMMJ1        364814 non-null  float64
 420  CLSMAN       312708 non-null  float64
 421  COBN_F       481825 non-null  object 
 422  COBN_M       481843 non-null  object 
 423  COBN_S       481836 non-null  object 
 424  COGACT       314557 non-null  float64
 425  CULTDIST     13380 non-null   float64
 426  CULTPOS      471357 non-null  float64
 427  DISCLIMA     314777 non-null  float64
 428  ENTUSE       295195 non-null  float64
 429  ESCS         473648 non-null  float64
 430  EXAPPLM      313279 non-null  float64
 431  EXPUREM      312602 non-null  float64
 432  FAILMAT      314448 non-null  float64
 433  FAMCON       310304 non-null  float64
 434  FAMCONC      308442 non-null  float64
 435  FAMSTRUC     429058 non-null  float64
 436  FISCED       452903 non-null  object 
 437  HEDRES       477772 non-null  float64
 438  HERITCUL     13496 non-null   float64
 439  HISCED       473091 non-null  object 
 440  HISEI        450621 non-null  float64
 441  HOMEPOS      479807 non-null  float64
 442  HOMSCH       293194 non-null  float64
 443  HOSTCUL      13598 non-null   float64
 444  ICTATTNEG    289744 non-null  float64
 445  ICTATTPOS    290490 non-null  float64
 446  ICTHOME      298740 non-null  float64
 447  ICTRES       477754 non-null  float64
 448  ICTSCH       297995 non-null  float64
 449  IMMIG        471793 non-null  object 
 450  INFOCAR      165792 non-null  float64
 451  INFOJOB1     83305 non-null   float64
 452  INFOJOB2     83305 non-null   float64
 453  INSTMOT      316322 non-null  float64
 454  INTMAT       316708 non-null  float64
 455  ISCEDD       485438 non-null  object 
 456  ISCEDL       485438 non-null  object 
 457  ISCEDO       485438 non-null  object 
 458  LANGCOMM     44094 non-null   float64
 459  LANGN        481765 non-null  object 
 460  LANGRPPD     43137 non-null   float64
 461  LMINS        282866 non-null  float64
 462  MATBEH       313847 non-null  float64
 463  MATHEFF      315948 non-null  float64
 464  MATINTFC     301360 non-null  float64
 465  MATWKETH     314501 non-null  float64
 466  MISCED       467085 non-null  object 
 467  MMINS        283303 non-null  float64
 468  MTSUP        313599 non-null  float64
 469  OCOD1        483887 non-null  object 
 470  OCOD2        482936 non-null  object 
 471  OPENPS       312766 non-null  float64
 472  OUTHOURS     308799 non-null  float64
 473  PARED        473091 non-null  float64
 474  PERSEV       313172 non-null  float64
 475  REPEAT       461117 non-null  object 
 476  SCMAT        314607 non-null  float64
 477  SMINS        270914 non-null  float64
 478  STUDREL      313860 non-null  float64
 479  SUBNORM      316323 non-null  float64
 480  TCHBEHFA     314678 non-null  float64
 481  TCHBEHSO     315114 non-null  float64
 482  TCHBEHTD     315519 non-null  float64
 483  TEACHSUP     316371 non-null  float64
 484  TESTLANG     484697 non-null  object 
 485  TIMEINT      297074 non-null  float64
 486  USEMATH      290260 non-null  float64
 487  USESCH       292585 non-null  float64
 488  WEALTH       479597 non-null  float64
 489  ANCATSCHL    306835 non-null  float64
 490  ANCATTLNACT  306487 non-null  float64
 491  ANCBELONG    307640 non-null  float64
 492  ANCCLSMAN    308467 non-null  float64
 493  ANCCOGACT    308150 non-null  float64
 494  ANCINSTMOT   155221 non-null  float64
 495  ANCINTMAT    155280 non-null  float64
 496  ANCMATWKETH  153879 non-null  float64
 497  ANCMTSUP     308631 non-null  float64
 498  ANCSCMAT     306948 non-null  float64
 499  ANCSTUDREL   308058 non-null  float64
 500  ANCSUBNORM   155233 non-null  float64
 501  PV1MATH      485490 non-null  float64
 502  PV2MATH      485490 non-null  float64
 503  PV3MATH      485490 non-null  float64
 504  PV4MATH      485490 non-null  float64
 505  PV5MATH      485490 non-null  float64
 506  PV1MACC      473031 non-null  float64
 507  PV2MACC      473031 non-null  float64
 508  PV3MACC      473031 non-null  float64
 509  PV4MACC      473031 non-null  float64
 510  PV5MACC      473031 non-null  float64
 511  PV1MACQ      473031 non-null  float64
 512  PV2MACQ      473031 non-null  float64
 513  PV3MACQ      473031 non-null  float64
 514  PV4MACQ      473031 non-null  float64
 515  PV5MACQ      473031 non-null  float64
 516  PV1MACS      473031 non-null  float64
 517  PV2MACS      473031 non-null  float64
 518  PV3MACS      473031 non-null  float64
 519  PV4MACS      473031 non-null  float64
 520  PV5MACS      473031 non-null  float64
 521  PV1MACU      473031 non-null  float64
 522  PV2MACU      473031 non-null  float64
 523  PV3MACU      473031 non-null  float64
 524  PV4MACU      473031 non-null  float64
 525  PV5MACU      473031 non-null  float64
 526  PV1MAPE      471439 non-null  float64
 527  PV2MAPE      471439 non-null  float64
 528  PV3MAPE      471439 non-null  float64
 529  PV4MAPE      471439 non-null  float64
 530  PV5MAPE      471439 non-null  float64
 531  PV1MAPF      471439 non-null  float64
 532  PV2MAPF      471439 non-null  float64
 533  PV3MAPF      471439 non-null  float64
 534  PV4MAPF      471439 non-null  float64
 535  PV5MAPF      471439 non-null  float64
 536  PV1MAPI      471439 non-null  float64
 537  PV2MAPI      471439 non-null  float64
 538  PV3MAPI      471439 non-null  float64
 539  PV4MAPI      471439 non-null  float64
 540  PV5MAPI      471439 non-null  float64
 541  PV1READ      485490 non-null  float64
 542  PV2READ      485490 non-null  float64
 543  PV3READ      485490 non-null  float64
 544  PV4READ      485490 non-null  float64
 545  PV5READ      485490 non-null  float64
 546  PV1SCIE      485490 non-null  float64
 547  PV2SCIE      485490 non-null  float64
 548  PV3SCIE      485490 non-null  float64
 549  PV4SCIE      485490 non-null  float64
 550  PV5SCIE      485490 non-null  float64
 551  W_FSTUWT     485490 non-null  float64
 552  W_FSTR1      485490 non-null  float64
 553  W_FSTR2      485490 non-null  float64
 554  W_FSTR3      485490 non-null  float64
 555  W_FSTR4      485490 non-null  float64
 556  W_FSTR5      485490 non-null  float64
 557  W_FSTR6      485490 non-null  float64
 558  W_FSTR7      485490 non-null  float64
 559  W_FSTR8      485490 non-null  float64
 560  W_FSTR9      485490 non-null  float64
 561  W_FSTR10     485490 non-null  float64
 562  W_FSTR11     485490 non-null  float64
 563  W_FSTR12     485490 non-null  float64
 564  W_FSTR13     485490 non-null  float64
 565  W_FSTR14     485490 non-null  float64
 566  W_FSTR15     485490 non-null  float64
 567  W_FSTR16     485490 non-null  float64
 568  W_FSTR17     485490 non-null  float64
 569  W_FSTR18     485490 non-null  float64
 570  W_FSTR19     485490 non-null  float64
 571  W_FSTR20     485490 non-null  float64
 572  W_FSTR21     485490 non-null  float64
 573  W_FSTR22     485490 non-null  float64
 574  W_FSTR23     485490 non-null  float64
 575  W_FSTR24     485490 non-null  float64
 576  W_FSTR25     485490 non-null  float64
 577  W_FSTR26     485490 non-null  float64
 578  W_FSTR27     485490 non-null  float64
 579  W_FSTR28     485490 non-null  float64
 580  W_FSTR29     485490 non-null  float64
 581  W_FSTR30     485490 non-null  float64
 582  W_FSTR31     485490 non-null  float64
 583  W_FSTR32     485490 non-null  float64
 584  W_FSTR33     485490 non-null  float64
 585  W_FSTR34     485490 non-null  float64
 586  W_FSTR35     485490 non-null  float64
 587  W_FSTR36     485490 non-null  float64
 588  W_FSTR37     485490 non-null  float64
 589  W_FSTR38     485490 non-null  float64
 590  W_FSTR39     485490 non-null  float64
 591  W_FSTR40     485490 non-null  float64
 592  W_FSTR41     485490 non-null  float64
 593  W_FSTR42     485490 non-null  float64
 594  W_FSTR43     485490 non-null  float64
 595  W_FSTR44     485490 non-null  float64
 596  W_FSTR45     485490 non-null  float64
 597  W_FSTR46     485490 non-null  float64
 598  W_FSTR47     485490 non-null  float64
 599  W_FSTR48     485490 non-null  float64
 600  W_FSTR49     485490 non-null  float64
 601  W_FSTR50     485490 non-null  float64
 602  W_FSTR51     485490 non-null  float64
 603  W_FSTR52     485490 non-null  float64
 604  W_FSTR53     485490 non-null  float64
 605  W_FSTR54     485490 non-null  float64
 606  W_FSTR55     485490 non-null  float64
 607  W_FSTR56     485490 non-null  float64
 608  W_FSTR57     485490 non-null  float64
 609  W_FSTR58     485490 non-null  float64
 610  W_FSTR59     485490 non-null  float64
 611  W_FSTR60     485490 non-null  float64
 612  W_FSTR61     485490 non-null  float64
 613  W_FSTR62     485490 non-null  float64
 614  W_FSTR63     485490 non-null  float64
 615  W_FSTR64     485490 non-null  float64
 616  W_FSTR65     485490 non-null  float64
 617  W_FSTR66     485490 non-null  float64
 618  W_FSTR67     485490 non-null  float64
 619  W_FSTR68     485490 non-null  float64
 620  W_FSTR69     485490 non-null  float64
 621  W_FSTR70     485490 non-null  float64
 622  W_FSTR71     485490 non-null  float64
 623  W_FSTR72     485490 non-null  float64
 624  W_FSTR73     485490 non-null  float64
 625  W_FSTR74     485490 non-null  float64
 626  W_FSTR75     485490 non-null  float64
 627  W_FSTR76     485490 non-null  float64
 628  W_FSTR77     485490 non-null  float64
 629  W_FSTR78     485490 non-null  float64
 630  W_FSTR79     485490 non-null  float64
 631  W_FSTR80     485490 non-null  float64
 632  WVARSTRR     485490 non-null  int64  
 633  VAR_UNIT     485490 non-null  int64  
 634  SENWGT_STU   485490 non-null  float64
 635  VER_STU      485490 non-null  object 
dtypes: float64(250), int64(18), object(368)
memory usage: 2.3+ GB

Structure of the dataset:¶

The file provided for this study, pisa2012.csv, contains data from a total of 485'490 students grouped in 636 columns. The dataset contains not only the results from the exam in each category, but also lots of information on the students' background, including variables like country of residence, number of family members and their level of education, possessions or access to different facilities at home and school.

Main feature(s) of interest in this dataset:¶

The main feature of this dataset is the score obtained by the students in each discipline and the potential for understanding how a number of different factors can impact these scores and therefore the level of preparation for students around the world. For the simplification purpose, it was supposed that the impact of the following factors shall be assessed on the performance of students in three (math, science and reading) areas in different countries:

1. Gender (ST04Q01)
2. Family Structure (FAMSTRUC)
3. Immigration status (IMMIG)
4. Education level of Father (FISCED)
5. Eduction level of Mother (MISCED)
6. Household Possessions (HOMEPOS, WEALTH, CULTPOS, HEDRES)
7. Use of ICT at home (ICTHOME)
8. Use of ICT for entertainment (ENTUSE)

Supporting features¶

The focus will be mainly on the performance of students in three subjects: Math, Science, Reading. For the simplification, a new column will be created and mean of all five categories under each subject will be considered.

Data wrangling¶

In order to focus on the areas of interest in our dataset and make it more readable for the users, the following steps are taken to find the relevant responses from the main dataset

In [16]:
#we need only the relevant columns as listed above, the remaining ones are not required for the current assessment:
cols=['CNT','STIDSTD','AGE','ST04Q01', 'FAMSTRUC', 'IMMIG', 'FISCED', 'MISCED', 'HOMEPOS', 'WEALTH', 'CULTPOS', 'HEDRES', 'ICTHOME', 'ENTUSE',
      'PV1MATH','PV2MATH','PV3MATH','PV4MATH','PV5MATH',
      'PV1READ','PV2READ','PV3READ','PV4READ','PV5READ',
      'PV1SCIE','PV2SCIE','PV3SCIE','PV4SCIE','PV5SCIE']
df_pisa_clean = pd.read_csv('pisa2012.csv', usecols=cols, encoding = "ISO-8859-1")
In [17]:
df_pisa_clean.head()
Out[17]:
CNT STIDSTD ST04Q01 AGE CULTPOS ENTUSE FAMSTRUC FISCED HEDRES HOMEPOS ICTHOME IMMIG MISCED WEALTH PV1MATH PV2MATH PV3MATH PV4MATH PV5MATH PV1READ PV2READ PV3READ PV4READ PV5READ PV1SCIE PV2SCIE PV3SCIE PV4SCIE PV5SCIE
0 Albania 1 Female 16.17 -0.48 NaN 2.0 ISCED 3A, ISCED 4 -1.29 -2.61 NaN Native ISCED 3A, ISCED 4 -2.92 406.8469 376.4683 344.5319 321.1637 381.9209 249.5762 254.3420 406.8496 175.7053 218.5981 341.7009 408.8400 348.2283 367.8105 392.9877
1 Albania 2 Female 16.17 1.27 NaN 2.0 ISCED 3A, ISCED 4 1.12 1.41 NaN Native ISCED 5A, 6 0.69 486.1427 464.3325 453.4273 472.9008 476.0165 406.2936 349.8975 400.7334 369.7553 396.7618 548.9929 471.5964 471.5964 443.6218 454.8116
2 Albania 3 Female 15.58 1.27 NaN 2.0 ISCED 5A, 6 -0.69 0.14 NaN Native ISCED 5A, 6 -0.23 533.2684 481.0796 489.6479 490.4269 533.2684 401.2100 404.3872 387.7067 431.3938 401.2100 499.6643 428.7952 492.2044 512.7191 499.6643
3 Albania 4 Female 15.67 1.27 NaN 2.0 ISCED 5A, 6 0.04 -0.73 NaN Native ISCED 3B, C -1.17 412.2215 498.6836 415.3373 466.7472 454.2842 547.3630 481.4353 461.5776 425.0393 471.9036 438.6796 481.5740 448.9370 474.1141 426.5573
4 Albania 5 Female 15.50 1.27 NaN 2.0 ISCED 3A, ISCED 4 -0.69 -0.57 NaN Native None -1.17 381.9209 328.1742 403.7311 418.5309 395.1628 311.7707 141.7883 293.5015 272.8495 260.1405 361.5628 275.7740 372.7527 403.5248 422.1746
In [18]:
df_pisa_clean.to_csv('pisa_clean.csv')
In [19]:
df_pisa_clean.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 485490 entries, 0 to 485489
Data columns (total 29 columns):
 #   Column    Non-Null Count   Dtype  
---  ------    --------------   -----  
 0   CNT       485490 non-null  object 
 1   STIDSTD   485490 non-null  int64  
 2   ST04Q01   485490 non-null  object 
 3   AGE       485374 non-null  float64
 4   CULTPOS   471357 non-null  float64
 5   ENTUSE    295195 non-null  float64
 6   FAMSTRUC  429058 non-null  float64
 7   FISCED    452903 non-null  object 
 8   HEDRES    477772 non-null  float64
 9   HOMEPOS   479807 non-null  float64
 10  ICTHOME   298740 non-null  float64
 11  IMMIG     471793 non-null  object 
 12  MISCED    467085 non-null  object 
 13  WEALTH    479597 non-null  float64
 14  PV1MATH   485490 non-null  float64
 15  PV2MATH   485490 non-null  float64
 16  PV3MATH   485490 non-null  float64
 17  PV4MATH   485490 non-null  float64
 18  PV5MATH   485490 non-null  float64
 19  PV1READ   485490 non-null  float64
 20  PV2READ   485490 non-null  float64
 21  PV3READ   485490 non-null  float64
 22  PV4READ   485490 non-null  float64
 23  PV5READ   485490 non-null  float64
 24  PV1SCIE   485490 non-null  float64
 25  PV2SCIE   485490 non-null  float64
 26  PV3SCIE   485490 non-null  float64
 27  PV4SCIE   485490 non-null  float64
 28  PV5SCIE   485490 non-null  float64
dtypes: float64(23), int64(1), object(5)
memory usage: 107.4+ MB
In [20]:
df_pisa_clean.describe()
Out[20]:
STIDSTD AGE CULTPOS ENTUSE FAMSTRUC HEDRES HOMEPOS ICTHOME WEALTH PV1MATH PV2MATH PV3MATH PV4MATH PV5MATH PV1READ PV2READ PV3READ PV4READ PV5READ PV1SCIE PV2SCIE PV3SCIE PV4SCIE PV5SCIE
count 485490.000000 485374.000000 471357.000000 295195.000000 429058.000000 477772.000000 479807.000000 298740.000000 479597.00000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.00000 485490.000000
mean 6134.066201 15.784283 -0.041828 -0.071999 1.889355 -0.195442 -0.324815 -0.100623 -0.33701 469.621653 469.648358 469.648930 469.641832 469.695396 472.004640 472.068052 472.022059 471.926562 472.013506 475.769824 475.813674 475.851549 475.78524 475.820184
std 6733.144944 0.290221 1.001965 1.054459 0.385621 1.074053 1.163213 1.076591 1.21530 103.265391 103.382077 103.407631 103.392286 103.419170 102.505523 102.626198 102.640489 102.576066 102.659989 101.464426 101.514649 101.495072 101.51220 101.566347
min 1.000000 15.170000 -1.510000 -3.974900 1.000000 -3.930000 -6.880000 -4.017800 -6.65000 19.792800 6.473000 42.226200 24.622200 37.085200 0.083400 0.703500 0.703500 4.134400 2.307400 2.648300 2.834800 11.879900 8.42970 17.754600
25% 1811.000000 15.580000 -0.480000 -0.547900 2.000000 -0.690000 -0.980000 -0.689100 -1.04000 395.318600 395.318600 395.240700 395.396500 395.240700 403.600700 403.360100 403.360100 403.354600 403.360100 404.457300 404.457300 404.550500 404.45730 404.457300
50% 3740.000000 15.750000 0.250000 -0.001800 2.000000 0.040000 -0.260000 -0.087200 -0.30000 466.201900 466.124000 466.201900 466.279800 466.435600 475.455000 475.535200 475.455000 475.535200 475.535200 475.699400 475.606100 475.699400 475.97910 475.885900
75% 7456.000000 16.000000 1.270000 0.454600 2.000000 1.120000 0.390000 0.416000 0.43000 541.057800 541.447300 541.291500 541.447300 541.447300 544.502500 544.503500 544.503500 544.502500 544.503500 547.780700 547.873900 547.967200 547.78070 547.780700
max 33806.000000 16.330000 1.270000 4.431900 3.000000 1.120000 4.150000 2.783300 3.25000 962.229300 957.010400 935.745400 943.456900 907.625800 904.802600 881.239200 884.447000 881.159000 901.608600 903.338300 900.540800 867.624000 926.55730 880.958600
In [21]:
#filling missing values:
age_mean = df_pisa_clean.AGE.mean()
df_pisa_clean['AGE'].fillna(age_mean, inplace=True)
In [22]:
df_pisa_clean.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 485490 entries, 0 to 485489
Data columns (total 29 columns):
 #   Column    Non-Null Count   Dtype  
---  ------    --------------   -----  
 0   CNT       485490 non-null  object 
 1   STIDSTD   485490 non-null  int64  
 2   ST04Q01   485490 non-null  object 
 3   AGE       485490 non-null  float64
 4   CULTPOS   471357 non-null  float64
 5   ENTUSE    295195 non-null  float64
 6   FAMSTRUC  429058 non-null  float64
 7   FISCED    452903 non-null  object 
 8   HEDRES    477772 non-null  float64
 9   HOMEPOS   479807 non-null  float64
 10  ICTHOME   298740 non-null  float64
 11  IMMIG     471793 non-null  object 
 12  MISCED    467085 non-null  object 
 13  WEALTH    479597 non-null  float64
 14  PV1MATH   485490 non-null  float64
 15  PV2MATH   485490 non-null  float64
 16  PV3MATH   485490 non-null  float64
 17  PV4MATH   485490 non-null  float64
 18  PV5MATH   485490 non-null  float64
 19  PV1READ   485490 non-null  float64
 20  PV2READ   485490 non-null  float64
 21  PV3READ   485490 non-null  float64
 22  PV4READ   485490 non-null  float64
 23  PV5READ   485490 non-null  float64
 24  PV1SCIE   485490 non-null  float64
 25  PV2SCIE   485490 non-null  float64
 26  PV3SCIE   485490 non-null  float64
 27  PV4SCIE   485490 non-null  float64
 28  PV5SCIE   485490 non-null  float64
dtypes: float64(23), int64(1), object(5)
memory usage: 107.4+ MB
In [23]:
# calculate mean value for math, reading, science
# add a new column for total of means
df_pisa_clean['MATH']=(df_pisa_clean['PV1MATH']+df_pisa_clean['PV2MATH']+df_pisa_clean['PV3MATH']+df_pisa_clean['PV4MATH']+df_pisa_clean['PV5MATH'])/5
df_pisa_clean['READING']=(df_pisa_clean['PV1READ']+df_pisa_clean['PV2READ']+df_pisa_clean['PV3READ']+df_pisa_clean['PV4READ']+df_pisa_clean['PV5READ'])/5
df_pisa_clean['SCIENCE']=(df_pisa_clean['PV1SCIE']+df_pisa_clean['PV2SCIE']+df_pisa_clean['PV3SCIE']+df_pisa_clean['PV4SCIE']+df_pisa_clean['PV5SCIE'])/5
df_pisa_clean['TOTAL']=(df_pisa_clean['MATH']+df_pisa_clean['READING']+df_pisa_clean['SCIENCE'])/3
In [24]:
df_pisa_clean.head()
Out[24]:
CNT STIDSTD ST04Q01 AGE CULTPOS ENTUSE FAMSTRUC FISCED HEDRES HOMEPOS ICTHOME IMMIG MISCED WEALTH PV1MATH PV2MATH PV3MATH PV4MATH PV5MATH PV1READ PV2READ PV3READ PV4READ PV5READ PV1SCIE PV2SCIE PV3SCIE PV4SCIE PV5SCIE MATH READING SCIENCE TOTAL
0 Albania 1 Female 16.17 -0.48 NaN 2.0 ISCED 3A, ISCED 4 -1.29 -2.61 NaN Native ISCED 3A, ISCED 4 -2.92 406.8469 376.4683 344.5319 321.1637 381.9209 249.5762 254.3420 406.8496 175.7053 218.5981 341.7009 408.8400 348.2283 367.8105 392.9877 366.18634 261.01424 371.91348 333.038020
1 Albania 2 Female 16.17 1.27 NaN 2.0 ISCED 3A, ISCED 4 1.12 1.41 NaN Native ISCED 5A, 6 0.69 486.1427 464.3325 453.4273 472.9008 476.0165 406.2936 349.8975 400.7334 369.7553 396.7618 548.9929 471.5964 471.5964 443.6218 454.8116 470.56396 384.68832 478.12382 444.458700
2 Albania 3 Female 15.58 1.27 NaN 2.0 ISCED 5A, 6 -0.69 0.14 NaN Native ISCED 5A, 6 -0.23 533.2684 481.0796 489.6479 490.4269 533.2684 401.2100 404.3872 387.7067 431.3938 401.2100 499.6643 428.7952 492.2044 512.7191 499.6643 505.53824 405.18154 486.60946 465.776413
3 Albania 4 Female 15.67 1.27 NaN 2.0 ISCED 5A, 6 0.04 -0.73 NaN Native ISCED 3B, C -1.17 412.2215 498.6836 415.3373 466.7472 454.2842 547.3630 481.4353 461.5776 425.0393 471.9036 438.6796 481.5740 448.9370 474.1141 426.5573 449.45476 477.46376 453.97240 460.296973
4 Albania 5 Female 15.50 1.27 NaN 2.0 ISCED 3A, ISCED 4 -0.69 -0.57 NaN Native None -1.17 381.9209 328.1742 403.7311 418.5309 395.1628 311.7707 141.7883 293.5015 272.8495 260.1405 361.5628 275.7740 372.7527 403.5248 422.1746 385.50398 256.01010 367.15778 336.223953
In [25]:
# since we used the average value for each of Math, reading and science, no need to keep the old columns.
df_pisa_clean.drop(columns=['PV1MATH','PV2MATH','PV3MATH','PV4MATH','PV5MATH',
                           'PV1READ','PV2READ','PV3READ','PV4READ','PV5READ',
                           'PV1SCIE','PV2SCIE','PV3SCIE','PV4SCIE','PV5SCIE'],inplace=True)
In [26]:
# in order to increas the readability of the dataset, some column names to be revised as follows:
df_pisa_clean.rename(columns={'CNT': 'COUNTRY', 'STIDSTD': 'STUDENT_ID', 'ST04Q01':'GENDER',
                              'MISCED': 'MOTHER_EDUC_LEVEL', 'FISCED':'FATHER_EDUC_LEVEL', 
                              'IMMIG': 'IMMIGRATION_STATUS', 'FAMSTRUC': 'FAMILY_STRUCTURE',
                              'HOMEPOS': 'HOME_POSSESSIONS', 'HEDRES': 'EDUC_RESOURCES', 'CULTPOS': 'CULTURAL_POSSESSIONS', 
                              'WEALTH': 'FAMILY_WEALTH', 'ICTHOME':'ICT_AT_HOME', 'ENTUSE':'ENTERTAINMENT_USE'},inplace=True)
In [27]:
# reordering the columns
df_pisa_clean = df_pisa_clean[['COUNTRY', 'STUDENT_ID', 'GENDER', 'AGE', 'IMMIGRATION_STATUS', 'MOTHER_EDUC_LEVEL', 'FATHER_EDUC_LEVEL',
                              'FAMILY_STRUCTURE', 'FAMILY_WEALTH','HOME_POSSESSIONS', 'EDUC_RESOURCES', 'CULTURAL_POSSESSIONS',
                               'ICT_AT_HOME', 'ENTERTAINMENT_USE', 'MATH', 'READING', 'SCIENCE', 'TOTAL']]
                              
In [28]:
df_pisa_clean.head()
Out[28]:
COUNTRY STUDENT_ID GENDER AGE IMMIGRATION_STATUS MOTHER_EDUC_LEVEL FATHER_EDUC_LEVEL FAMILY_STRUCTURE FAMILY_WEALTH HOME_POSSESSIONS EDUC_RESOURCES CULTURAL_POSSESSIONS ICT_AT_HOME ENTERTAINMENT_USE MATH READING SCIENCE TOTAL
0 Albania 1 Female 16.17 Native ISCED 3A, ISCED 4 ISCED 3A, ISCED 4 2.0 -2.92 -2.61 -1.29 -0.48 NaN NaN 366.18634 261.01424 371.91348 333.038020
1 Albania 2 Female 16.17 Native ISCED 5A, 6 ISCED 3A, ISCED 4 2.0 0.69 1.41 1.12 1.27 NaN NaN 470.56396 384.68832 478.12382 444.458700
2 Albania 3 Female 15.58 Native ISCED 5A, 6 ISCED 5A, 6 2.0 -0.23 0.14 -0.69 1.27 NaN NaN 505.53824 405.18154 486.60946 465.776413
3 Albania 4 Female 15.67 Native ISCED 3B, C ISCED 5A, 6 2.0 -1.17 -0.73 0.04 1.27 NaN NaN 449.45476 477.46376 453.97240 460.296973
4 Albania 5 Female 15.50 Native None ISCED 3A, ISCED 4 2.0 -1.17 -0.57 -0.69 1.27 NaN NaN 385.50398 256.01010 367.15778 336.223953
In [29]:
# rename some categorical values in order to enhance the readability:
# Family_structure: 
#     1: single parent family
#     2: two parent family
#     3: No parent family

df_pisa_clean.FAMILY_STRUCTURE = df_pisa_clean['FAMILY_STRUCTURE'].replace(to_replace = [1.0 ,2.0 ,3.0], value=['Single Parent', 'Two Parents', 'No Parents'])
In [30]:
df_pisa_clean.FAMILY_STRUCTURE.value_counts()
Out[30]:
Two Parents      360003
Single Parent     58264
No Parents        10791
Name: FAMILY_STRUCTURE, dtype: int64
In [31]:
df_pisa_clean.head()
Out[31]:
COUNTRY STUDENT_ID GENDER AGE IMMIGRATION_STATUS MOTHER_EDUC_LEVEL FATHER_EDUC_LEVEL FAMILY_STRUCTURE FAMILY_WEALTH HOME_POSSESSIONS EDUC_RESOURCES CULTURAL_POSSESSIONS ICT_AT_HOME ENTERTAINMENT_USE MATH READING SCIENCE TOTAL
0 Albania 1 Female 16.17 Native ISCED 3A, ISCED 4 ISCED 3A, ISCED 4 Two Parents -2.92 -2.61 -1.29 -0.48 NaN NaN 366.18634 261.01424 371.91348 333.038020
1 Albania 2 Female 16.17 Native ISCED 5A, 6 ISCED 3A, ISCED 4 Two Parents 0.69 1.41 1.12 1.27 NaN NaN 470.56396 384.68832 478.12382 444.458700
2 Albania 3 Female 15.58 Native ISCED 5A, 6 ISCED 5A, 6 Two Parents -0.23 0.14 -0.69 1.27 NaN NaN 505.53824 405.18154 486.60946 465.776413
3 Albania 4 Female 15.67 Native ISCED 3B, C ISCED 5A, 6 Two Parents -1.17 -0.73 0.04 1.27 NaN NaN 449.45476 477.46376 453.97240 460.296973
4 Albania 5 Female 15.50 Native None ISCED 3A, ISCED 4 Two Parents -1.17 -0.57 -0.69 1.27 NaN NaN 385.50398 256.01010 367.15778 336.223953

Univariate Exploration¶

In this section, we will investigate distributions of individual variables. If we see unusual points or outliers, we will take a deeper look to clean things up and prepare yourself to look at relationships between variables.

QUESTION: How is the distribution of all three subjects of our interest, MATH, SCIENCE and READING?

In [32]:
#creating histograms of three interested subject to have a proper look at their distribution

math_bins=np.arange(50,df_pisa_clean['MATH'].max()+10,10)
reading_bins=np.arange(5,df_pisa_clean['READING'].max()+10,10)
science_bins=np.arange(20,df_pisa_clean['SCIENCE'].max()+10,10)

plt.figure(figsize=(30,8))
plt.subplot(1, 3, 1)
plt.hist(df_pisa_clean['MATH'],bins=math_bins);
plt.title('Maths Average',fontsize=30)
plt.axvline(df_pisa_clean['MATH'].mean(), color='r', linestyle='--', label='mean')
plt.text(df_pisa_clean['MATH'].mean(), 0.01, f"mean: {df_pisa_clean['MATH'].mean():.2f}", ha='left', va = 'bottom', color='r')
plt.legend()

plt.subplot(1, 3, 2)
plt.hist(df_pisa_clean['SCIENCE'],bins=science_bins);
plt.title('Science Average',fontsize=30)
plt.axvline(df_pisa_clean['SCIENCE'].mean(), color='r', linestyle='--', label='mean')
plt.text(df_pisa_clean['SCIENCE'].mean(), 0.01, f"mean: {df_pisa_clean['SCIENCE'].mean():.2f}", ha='left', va = 'bottom', color='r')
plt.legend()

plt.subplot(1, 3, 3)
plt.hist(df_pisa_clean['READING'],bins=reading_bins);
plt.title('Reading Average',fontsize=30)
plt.axvline(df_pisa_clean['READING'].mean(), color='r', linestyle='--', label='mean')
plt.text(df_pisa_clean['READING'].mean(), 0.01, f"mean: {df_pisa_clean['READING'].mean():.2f}", ha='left', va = 'bottom', color='r')
plt.legend()
Out[32]:
<matplotlib.legend.Legend at 0x1cb5897c490>

OBSERVATION: It seems that they are normal distributions. It also indicates the mean values for three interested subjects. Based on the mean of the distribution, it seems that relatively students are performing better in science then reading and then maths accordingly with slightly small margin of difference.

QUESTION: How are the outliers in scores accross these three subjects?

In [33]:
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(30, 8))
sns.boxplot(y=df_pisa_clean.MATH, ax=ax1)
sns.boxplot(y=df_pisa_clean.SCIENCE, ax=ax2)
sns.boxplot(y=df_pisa_clean.READING, ax=ax3)

ax1.set_title("Maths Average", fontsize = 30)
ax2.set_title("Science Average", fontsize = 30)
ax3.set_title("Reading Average", fontsize = 30)
# Display the plot
plt.show()

OBSERVATION: Box plots are usefull when we want to have a look to variables, if there are any outliers. As it seems, there are no effective outliers in all three interested subjects which can change the outcome of our assessments.

QUESTION: How students are categorized from family wealth point of view?

In [34]:
sns.displot(df_pisa_clean.FAMILY_WEALTH, bins = [-6, -4, -2, 0, 2, 4])
Out[34]:
<seaborn.axisgrid.FacetGrid at 0x1cb58b238e0>

OBSERVATION: Most of students are categorized under 0. In order to understand well, family wealth comprised of the following items, if a student owns in his house: A room of your own, A link to the Internet, Cellular phones, TV, Computer, Car and room with a bath or shower.

QUESTION: Are the majority of students have access to ICT at home?

In [35]:
sns.displot(df_pisa_clean.ICT_AT_HOME, bins = [-4, -2, 0, 2, 4])
df_pisa_clean.query('ICT_AT_HOME<=0').STUDENT_ID.count()/df_pisa_clean.query('ICT_AT_HOME>0').STUDENT_ID.count()
Out[35]:
1.2703023117960877

OBSERVATION: It shows that majority of students are lower or equal than 0 in having ICT at home. ICT at home comprised of having desktop/laptop computer, own cell phone with/without internet, music player, printer, usb and ebook reader.

QUESTION: Are the majority of students have access to entertainment ICT at home?

In [36]:
sns.displot(df_pisa_clean.ENTERTAINMENT_USE, bins = [-4, -2, 0, 2, 4])
df_pisa_clean.query('ENTERTAINMENT_USE<=0').STUDENT_ID.count()/df_pisa_clean.query('ENTERTAINMENT_USE>0').STUDENT_ID.count()
Out[36]:
1.1786895167242346

OBSERVATION: It shows that majority of students are lower or equal than 0 in having entertainment ICT at home, i.e. playing games, participating in social media networks and etc. However this proportion is slightly lower than having ICT at home in general.

QUESTION: How gender among students is distributed?

In [37]:
# To examine how the gender balance is distributed. 

value_counts = df_pisa_clean['GENDER'].value_counts()

# Create a pie chart
plt.pie(value_counts, labels = value_counts.index, startangle = 90, counterclock = False, autopct='%1.0f%%')
plt.title('Gender Distribution')
# Display the plot
plt.show()

OBSERVATION: Gender distribution is equally balanced between male and female students who have been tested.

QUESTION: How students are distributed from immigration point of view?

In [38]:
# Examine the distribution of categorical values, such as IMMIGRATION_STATUS, MOTHER_EDUC_LEVEL, FATHER_EDUC_LEVEL, FAMILY_STRUCTURE
ax = sns.countplot(x=df_pisa_clean.IMMIGRATION_STATUS, data=df_pisa_clean)
abs_values = df_pisa_clean.IMMIGRATION_STATUS.value_counts()
rel_values = df_pisa_clean.IMMIGRATION_STATUS.value_counts(normalize=True).values * 100
lbls = [f'{p[0]} ({p[1]:.1f}%)' for p in zip(abs_values, rel_values)]
ax.bar_label(container=ax.containers[0], labels=lbls)

# Define a formatting function for the y-axis tick labels
def format_yticklabels(value, tick_number):
    return "{:.0f}K".format(value/1000)

# Set the y-axis tick labels using the formatting function
ax.yaxis.set_major_formatter(FuncFormatter(format_yticklabels))

# Show the plot
plt.show()

Observation: It seems that native respondent students are the majority in PISA, while also can be examined that second generation and first generation immigrated students are also existing. It will be interesting to exammine in next chapters, how these generations are performing in each subjec respectively.

QUESTION: How parents education level is distributed?

In [39]:
# Examine the distribution of categorical values, such as IMMIGRATION_STATUS, MOTHER_EDUC_LEVEL, FATHER_EDUC_LEVEL, FAMILY_STRUCTURE

plt.figure(figsize=(15,10))

plt.subplot(1, 2, 1)
ax = sns.countplot(x=df_pisa_clean.MOTHER_EDUC_LEVEL, order=df_pisa_clean.MOTHER_EDUC_LEVEL.value_counts(ascending=False).index,  data=df_pisa_clean)
abs_values = df_pisa_clean.MOTHER_EDUC_LEVEL.value_counts(ascending = False)
rel_values = df_pisa_clean.MOTHER_EDUC_LEVEL.value_counts(ascending = False, normalize=True).values * 100
lbls = [f'{p[0]} ({p[1]:.1f}%)' for p in zip(abs_values, rel_values)]
ax.bar_label(container=ax.containers[0], labels=lbls,  rotation = 90, padding = -35)

# Define a formatting function for the y-axis tick labels
def format_yticklabels(value, tick_number):
    return "{:.0f}K".format(value/1000)

# Set the y-axis tick labels using the formatting function
ax.yaxis.set_major_formatter(FuncFormatter(format_yticklabels))

ax.tick_params(axis='x', rotation=90)


plt.subplot(1, 2, 2)
ax = sns.countplot(x=df_pisa_clean.FATHER_EDUC_LEVEL, order=df_pisa_clean.FATHER_EDUC_LEVEL.value_counts(ascending=False).index,  data=df_pisa_clean)
abs_values = df_pisa_clean.FATHER_EDUC_LEVEL.value_counts(ascending = False)
rel_values = df_pisa_clean.FATHER_EDUC_LEVEL.value_counts(ascending = False, normalize=True).values * 100
lbls = [f'{p[0]} ({p[1]:.1f}%)' for p in zip(abs_values, rel_values)]
ax.bar_label(container=ax.containers[0], labels=lbls,  rotation = 90, padding = -35)

# Define a formatting function for the y-axis tick labels
def format_yticklabels(value, tick_number):
    return "{:.0f}K".format(value/1000)

# Set the y-axis tick labels using the formatting function
ax.yaxis.set_major_formatter(FuncFormatter(format_yticklabels))

ax.tick_params(axis='x', rotation=90)
# Show the plot
plt.show()

Observation: First, we have to understand the categories:

  • ISCED 1 (primary education)
  • ISCED 2 (lower secondary)
  • ISCED Level 3B or 3C (vocational/pre-vocational upper secondary)
  • ISCED 3A (general upper secondary) and/or ISCED 4 (non-tertiary post-secondary)
  • ISCED 5B (vocational tertiary)
  • ISCED 5A, 6 (theoretically oriented tertiary and post-graduate). It seeems that Most of mothers are having general or vocational upper/post secondary eduction. In father eduction, it seems that most of the fathers having the same education level as mothers, but it seems that fathers with no eduction are slightly less than mothers with no education.

QUESTION: How is the family structure among tested students?

In [40]:
# Examine the distribution of categorical values, such as IMMIGRATION_STATUS, MOTHER_EDUC_LEVEL, FATHER_EDUC_LEVEL, FAMILY_STRUCTURE
ax = sns.countplot(x=df_pisa_clean.FAMILY_STRUCTURE, order=df_pisa_clean.FAMILY_STRUCTURE.value_counts(ascending=False).index,  data=df_pisa_clean)
abs_values = df_pisa_clean.FAMILY_STRUCTURE.value_counts(ascending = False)
rel_values = df_pisa_clean.FAMILY_STRUCTURE.value_counts(ascending = False, normalize=True).values * 100
lbls = [f'{p[0]} ({p[1]:.1f}%)' for p in zip(abs_values, rel_values)]
ax.bar_label(container=ax.containers[0], labels=lbls)

# Define a formatting function for the y-axis tick labels
def format_yticklabels(value, tick_number):
    return "{:.0f}K".format(value/1000)

# Set the y-axis tick labels using the formatting function
ax.yaxis.set_major_formatter(FuncFormatter(format_yticklabels))

ax.tick_params(axis='x', rotation=0)
# Show the plot
plt.show()

Observation: It seems that most of the students are having two parent guardians. It will also be interesting to assess how students are performing with single and no parents.

Interesting distributions:¶

It will be interesting to see the performance of students with different family struture, with different family wealth status and with different immigration status. Also, it will be interesting to see how students perform better in each subject from gender point of view or from parents education level.

Unusual distributions or any data modification:¶

Categorizing the value columns of Family wealth, ICT at home and Entertainment ICT at home were done in order to understand better the categories. First its minimum and maximum values were assessed then the bins were defined.

Bivariate Exploration¶

In this section, we will investigate relationships between pairs of variables in our data. Make sure the variables that we cover here have been introduced in some fashion in the previous section (univariate exploration).

QUESTION: How students are performing well in all three subjects considering their family structure?

In [41]:
# now we will examine if family structure is playing any role in performance of students in the mentioned three subjects.
df_pisa_FAM = df_pisa_clean.groupby('FAMILY_STRUCTURE')[['MATH','READING','SCIENCE']].mean()
In [42]:
df_pisa_FAM.plot()
Out[42]:
<AxesSubplot:xlabel='FAMILY_STRUCTURE'>

Observation: It seems that family structure is playing a key role in performance of students in all three subjects.

QUESTION: How the mothers' education level impact the performance of students in all three subjects?

In [171]:
# we will examine the impact of mother's education level on students performance in different subjects: 

df_pisa_MISCED = df_pisa_clean.groupby('MOTHER_EDUC_LEVEL')[['MATH','READING','SCIENCE']].mean()
plt.figure(figsize=(25,10))
plt.suptitle("Imapact of Mother's education level on performance of students", fontsize = 30)

plt.subplot(1, 3, 1)
ax = sns.barplot(x = 'MATH', y = df_pisa_MISCED.index, data = df_pisa_MISCED, order = df_pisa_MISCED.sort_values(by = 'MATH').index)
ax.set_ylabel("Mother's Education Level", fontsize = '20')
plt.subplot(1, 3, 2)
ax = sns.barplot(x = 'SCIENCE', y = df_pisa_MISCED.index, data = df_pisa_MISCED, order = df_pisa_MISCED.sort_values(by = 'SCIENCE').index)
ax.set_ylabel("")
plt.subplot(1, 3, 3)
ax = sns.barplot(x = 'READING', y = df_pisa_MISCED.index, data = df_pisa_MISCED, order = df_pisa_MISCED.sort_values(by = 'READING').index)
ax.set_ylabel("")
Out[171]:
Text(0, 0.5, '')

Observation: It seems that education level of mother is impacting the performance of students in all three subjects. Interesting point is that ISCED 3B and 3C education level of mother is impacting more than ISCED 3A and 4 level in all three subjects. Means that mothers with vocation/pre-vocation upper secondary education is impacting well than mothers with general upper secondary or non-tertiary post secondary education.

QUESTION: How the fathers' education level impact the performance of students in all three subjects?

In [46]:
#examine the impact of father's education level on students performance in different subjects: 
df_pisa_FISCED = df_pisa_clean.groupby('FATHER_EDUC_LEVEL')[['MATH','READING','SCIENCE']].mean()
plt.figure(figsize=(25,10))
plt.suptitle("Imapact of Father's education level on performance of students", fontsize = 20)

plt.subplot(1, 3, 1)
ax = sns.barplot(x = 'MATH', y = df_pisa_FISCED.index, data = df_pisa_FISCED, order = df_pisa_FISCED.sort_values(by = 'MATH').index)
ax.set_ylabel("")
plt.subplot(1, 3, 2)
ax = sns.barplot(x = 'SCIENCE', y = df_pisa_FISCED.index, data = df_pisa_FISCED, order = df_pisa_FISCED.sort_values(by = 'SCIENCE').index)
ax.set_ylabel("")
plt.subplot(1, 3, 3)
ax = sns.barplot(x = 'READING', y = df_pisa_FISCED.index, data = df_pisa_FISCED, order = df_pisa_FISCED.sort_values(by = 'READING').index)
ax.set_ylabel("")
Out[46]:
Text(0, 0.5, '')

Observation: It seems that in Math subject, fathers with vocational upper secondary are having greater impacts on the success of students in comparison of fathers having post secondary or even vocational tertiary education levels. In science and reading subjects, fathers with vocational upper secondary are performing better than those with general upper secondary or post secondary.

QUESTION: How both parents' education level impact the performance of students in all three subjects?

In [47]:
educations = df_pisa_clean.groupby(['FATHER_EDUC_LEVEL','MOTHER_EDUC_LEVEL']).size().reset_index(name='count')
ed = educations.pivot("FATHER_EDUC_LEVEL", "MOTHER_EDUC_LEVEL", "count")
# define the plot
ax = sns.heatmap(ed, annot=True, fmt='d', cmap="YlGnBu")
ax.set_title('Correlation between highest levels of education achieved by each parent')
ax.set_xlabel('Mother\'s education')
ax.set_ylabel('Father\'s education')
Out[47]:
Text(50.72222222222221, 0.5, "Father's education")

Observation: It shows a higher correlation between each parent's education level. As it seems, mothers and fathers with ISCED 5A,6 and ISCED 3A,4 are having the highest impact on success of students in all listed three subjects.

QUESTION: Top 10 well and Top 10 worst countries based on the performance of students in all three subjects.

In [48]:
#Now we will examine the top 10 countries in every subject: 
df_pisa_country = df_pisa_clean.groupby('COUNTRY')[['MATH','READING','SCIENCE',]].mean()
In [49]:
plt.figure(figsize=(23,10))
plt.suptitle("Top 10 well performed countries", fontsize = 20)

plt.subplot(1, 3, 1)
top_10_maths = df_pisa_country.sort_values(by="MATH", ascending=False).head(10)
ax = sns.barplot(x = 'MATH', y = top_10_maths.index, data = top_10_maths)
ax.set_ylabel("")

plt.subplot(1, 3, 2)
top_10_science = df_pisa_country.sort_values(by="SCIENCE", ascending=False).head(10)
ax = sns.barplot(x = 'SCIENCE', y = top_10_science.index, data = top_10_science)
ax.set_ylabel("")

plt.subplot(1, 3, 3)
top_10_reading = df_pisa_country.sort_values(by="READING", ascending=False).head(10)
ax = sns.barplot(x = 'READING', y = top_10_reading.index, data = top_10_reading)
ax.set_ylabel("")
Out[49]:
Text(0, 0.5, '')
In [50]:
plt.figure(figsize=(23,10))
plt.suptitle("Top 10 worst performed countries", fontsize = 20)

plt.subplot(1, 3, 1)
top_10_maths = df_pisa_country.sort_values(by="MATH", ascending=True).head(10)
ax = sns.barplot(x = 'MATH', y = top_10_maths.index, data = top_10_maths)
ax.set_ylabel("")

plt.subplot(1, 3, 2)
top_10_science = df_pisa_country.sort_values(by="SCIENCE", ascending=True).head(10)
ax = sns.barplot(x = 'SCIENCE', y = top_10_science.index, data = top_10_science)
ax.set_ylabel("")

plt.subplot(1, 3, 3)
top_10_reading = df_pisa_country.sort_values(by="READING", ascending=True).head(10)
ax = sns.barplot(x = 'READING', y = top_10_reading.index, data = top_10_reading)
ax.set_ylabel("")
Out[50]:
Text(0, 0.5, '')

Observation: It seems that most of east asian countries are at top in the list.

QUESTION: How is the relationship between each subject?

In [51]:
#now we will examine the relationship between subjects: 

df_pisa_sample = df_pisa_clean.sample(5000)
plt.figure(figsize = [20, 4])

ax1 = plt.subplot(1, 3, 1)
sns.regplot(x = 'MATH', y= 'SCIENCE', data = df_pisa_sample, scatter_kws={'alpha':1/20}, line_kws={"color": "red"})
coef, p = stats.pearsonr(df_pisa_sample['MATH'], df_pisa_sample['SCIENCE'])
# Add the correlation coefficient to the plot
plt.annotate(f'correlation: {round(coef,2)}',xy=(0.5, 0.9), xycoords='axes fraction', color='black', fontsize=14)

ax2 = plt.subplot(1, 3, 2) 
sns.regplot(x = 'MATH', y= 'READING', data = df_pisa_sample, scatter_kws={'alpha':1/20}, line_kws={"color": "red"})
coef, p = stats.pearsonr(df_pisa_sample['MATH'], df_pisa_sample['READING'])
# Add the correlation coefficient to the plot
plt.annotate(f'correlation: {round(coef,2)}',xy=(0.5, 0.9), xycoords='axes fraction', color='black', fontsize=14)

ax3 = plt.subplot(1, 3, 3)
sns.regplot(x = 'SCIENCE', y= 'READING', data = df_pisa_sample, scatter_kws={'alpha':1/20}, line_kws={"color": "red"})
coef, p = stats.pearsonr(df_pisa_sample['SCIENCE'], df_pisa_sample['READING'])
# Add the correlation coefficient to the plot
plt.annotate(f'correlation: {round(coef,2)}',xy=(0.5, 0.9), xycoords='axes fraction', color='black', fontsize=14)
Out[51]:
Text(0.5, 0.9, 'correlation: 0.91')
In [52]:
corr = df_pisa_clean[['MATH', 'READING', 'SCIENCE']].corr()
sns.heatmap(corr, annot=True, cmap='coolwarm')
Out[52]:
<AxesSubplot:>

Observation: Our analysis reveals a strong correlation between all subjects. This suggests that if a student excels in one of these subjects, they will likely excel in the other two as well. However, it's worth noting that the correlation between math and science is particularly strong, while the correlation between math and reading is comparatively weaker

QUESTION: Lets see if ICT at home and use of ICT for entertainment at home having any impact on performance of students in all three subjects?

In [138]:
#now we will examine the relationship between subjects AND use of ICT at home and use of other entertainment ICT at home i.e. games

df_pisa_sample = df_pisa_clean.sample(5000)
df_pisa_sample = df_pisa_sample.dropna()
plt.figure(figsize = [20, 10])
plt.suptitle("Impact of using ICT or entertainment ICT at home on the performance of students", fontsize = 20, y = 1)
ax1 = plt.subplot(3, 2, 1)
sns.regplot(x = 'READING', y= 'ICT_AT_HOME', data = df_pisa_sample, scatter_kws={'alpha':1/20}, line_kws={"color": "red"})
coef, p = stats.pearsonr(df_pisa_sample['READING'], df_pisa_sample['ICT_AT_HOME'])
# Add the correlation coefficient to the plot
plt.annotate(f'correlation: {round(coef,2)}',xy=(0.5, 0.9), xycoords='axes fraction', color='black', fontsize=14)

ax2 = plt.subplot(3, 2, 2) 
sns.regplot(x = 'READING', y= 'ENTERTAINMENT_USE', data = df_pisa_sample, scatter_kws={'alpha':1/20}, line_kws={"color": "red"})
coef, p = stats.pearsonr(df_pisa_sample['READING'], df_pisa_sample['ENTERTAINMENT_USE'])
# Add the correlation coefficient to the plot
plt.annotate(f'correlation: {round(coef,2)}',xy=(0.5, 0.9), xycoords='axes fraction', color='black', fontsize=14)

ax3 = plt.subplot(3, 2, 3) 
sns.regplot(x = 'MATH', y= 'ICT_AT_HOME', data = df_pisa_sample, scatter_kws={'alpha':1/20}, line_kws={"color": "red"})
coef, p = stats.pearsonr(df_pisa_sample['MATH'], df_pisa_sample['ICT_AT_HOME'])
# Add the correlation coefficient to the plot
plt.annotate(f'correlation: {round(coef,2)}',xy=(0.5, 0.9), xycoords='axes fraction', color='black', fontsize=14)


ax4 = plt.subplot(3, 2, 4) 
sns.regplot(x = 'MATH', y= 'ENTERTAINMENT_USE', data = df_pisa_sample, scatter_kws={'alpha':1/20}, line_kws={"color": "red"})
coef, p = stats.pearsonr(df_pisa_sample['MATH'], df_pisa_sample['ENTERTAINMENT_USE'])
# Add the correlation coefficient to the plot
plt.annotate(f'correlation: {round(coef,2)}',xy=(0.5, 0.9), xycoords='axes fraction', color='black', fontsize=14)


ax3 = plt.subplot(3, 2, 5) 
sns.regplot(x = 'SCIENCE', y= 'ICT_AT_HOME', data = df_pisa_sample, scatter_kws={'alpha':1/20}, line_kws={"color": "red"})
coef, p = stats.pearsonr(df_pisa_sample['SCIENCE'], df_pisa_sample['ICT_AT_HOME'])
# Add the correlation coefficient to the plot
plt.annotate(f'correlation: {round(coef,2)}',xy=(0.5, 0.9), xycoords='axes fraction', color='black', fontsize=14)


ax4 = plt.subplot(3, 2, 6) 
sns.regplot(x = 'SCIENCE', y= 'ENTERTAINMENT_USE', data = df_pisa_sample, scatter_kws={'alpha':1/20}, line_kws={"color": "red"})
coef, p = stats.pearsonr(df_pisa_sample['SCIENCE'], df_pisa_sample['ENTERTAINMENT_USE'])
# Add the correlation coefficient to the plot
plt.annotate(f'correlation: {round(coef,2)}',xy=(0.5, 0.9), xycoords='axes fraction', color='black', fontsize=14)
Out[138]:
Text(0.5, 0.9, 'correlation: 0.08')

Observation: It appears that there is no relationship between the use of ICT at home or the use of entertainment ICT at home and the academic performance of students in these subjects. Despite the notion that spending more time watching TV or using video games at home may negatively impact students' performance in school, our data suggests that any correlation between the two is minimal. However, it is still important to consider the impact of these activities on students' physical and mental health.

QUESTION: Lets see if family wealth, cultural possessions at home and educational resources at home having any impact on performance of students in all three subjects?

In [54]:
#now we will examine the relationship between subjects AND use of ICT at home and use of other entertainment ICT at home i.e. games

df_pisa_sample = df_pisa_clean.sample(10000)
df_pisa_sample = df_pisa_sample.dropna()
plt.figure(figsize = [20, 10])

ax1 = plt.subplot(3, 3, 1)
sns.regplot(x = 'READING', y= 'FAMILY_WEALTH', data = df_pisa_sample, scatter_kws={'alpha':1/20}, line_kws={"color": "red"})
coef, p = stats.pearsonr(df_pisa_sample['READING'], df_pisa_sample['FAMILY_WEALTH'])
# Add the correlation coefficient to the plot
plt.annotate(f'correlation: {round(coef,2)}',xy=(0.5, 0.9), xycoords='axes fraction', color='black', fontsize=14)

ax2 = plt.subplot(3, 3, 2)
sns.regplot(x = 'READING', y= 'CULTURAL_POSSESSIONS', data = df_pisa_sample, scatter_kws={'alpha':1/20}, line_kws={"color": "red"})
coef, p = stats.pearsonr(df_pisa_sample['READING'], df_pisa_sample['CULTURAL_POSSESSIONS'])
# Add the correlation coefficient to the plot
plt.annotate(f'correlation: {round(coef,2)}',xy=(0.5, 0.9), xycoords='axes fraction', color='black', fontsize=14)

ax3 = plt.subplot(3, 3, 3)
sns.regplot(x = 'READING', y= 'EDUC_RESOURCES', data = df_pisa_sample, scatter_kws={'alpha':1/20}, line_kws={"color": "red"})
coef, p = stats.pearsonr(df_pisa_sample['READING'], df_pisa_sample['EDUC_RESOURCES'])
# Add the correlation coefficient to the plot
plt.annotate(f'correlation: {round(coef,2)}',xy=(0.5, 0.9), xycoords='axes fraction', color='black', fontsize=14)

ax4 = plt.subplot(3, 3, 4)
sns.regplot(x = 'MATH', y= 'FAMILY_WEALTH', data = df_pisa_sample, scatter_kws={'alpha':1/20}, line_kws={"color": "red"})
coef, p = stats.pearsonr(df_pisa_sample['MATH'], df_pisa_sample['FAMILY_WEALTH'])
# Add the correlation coefficient to the plot
plt.annotate(f'correlation: {round(coef,2)}',xy=(0.5, 0.9), xycoords='axes fraction', color='black', fontsize=14)

ax5 = plt.subplot(3, 3, 5)
sns.regplot(x = 'MATH', y= 'CULTURAL_POSSESSIONS', data = df_pisa_sample, scatter_kws={'alpha':1/20}, line_kws={"color": "red"})
coef, p = stats.pearsonr(df_pisa_sample['MATH'], df_pisa_sample['CULTURAL_POSSESSIONS'])
# Add the correlation coefficient to the plot
plt.annotate(f'correlation: {round(coef,2)}',xy=(0.5, 0.9), xycoords='axes fraction', color='black', fontsize=14)

ax6 = plt.subplot(3, 3, 6)
sns.regplot(x = 'MATH', y= 'EDUC_RESOURCES', data = df_pisa_sample, scatter_kws={'alpha':1/20}, line_kws={"color": "red"})
coef, p = stats.pearsonr(df_pisa_sample['MATH'], df_pisa_sample['EDUC_RESOURCES'])
# Add the correlation coefficient to the plot
plt.annotate(f'correlation: {round(coef,2)}',xy=(0.5, 0.9), xycoords='axes fraction', color='black', fontsize=14)

ax7 = plt.subplot(3, 3, 7)
sns.regplot(x = 'SCIENCE', y= 'FAMILY_WEALTH', data = df_pisa_sample, scatter_kws={'alpha':1/20}, line_kws={"color": "red"})
coef, p = stats.pearsonr(df_pisa_sample['SCIENCE'], df_pisa_sample['FAMILY_WEALTH'])
# Add the correlation coefficient to the plot
plt.annotate(f'correlation: {round(coef,2)}',xy=(0.5, 0.9), xycoords='axes fraction', color='black', fontsize=14)

ax8 = plt.subplot(3, 3, 8)
sns.regplot(x = 'SCIENCE', y= 'CULTURAL_POSSESSIONS', data = df_pisa_sample, scatter_kws={'alpha':1/20}, line_kws={"color": "red"})
coef, p = stats.pearsonr(df_pisa_sample['SCIENCE'], df_pisa_sample['CULTURAL_POSSESSIONS'])
# Add the correlation coefficient to the plot
plt.annotate(f'correlation: {round(coef,2)}',xy=(0.5, 0.9), xycoords='axes fraction', color='black', fontsize=14)

ax9 = plt.subplot(3, 3, 9)
sns.regplot(x = 'SCIENCE', y= 'EDUC_RESOURCES', data = df_pisa_sample, scatter_kws={'alpha':1/20}, line_kws={"color": "red"})
coef, p = stats.pearsonr(df_pisa_sample['SCIENCE'], df_pisa_sample['EDUC_RESOURCES'])
# Add the correlation coefficient to the plot
plt.annotate(f'correlation: {round(coef,2)}',xy=(0.5, 0.9), xycoords='axes fraction', color='black', fontsize=14)
Out[54]:
Text(0.5, 0.9, 'correlation: 0.26')

Observation: It seems that all three factors are not having significant impact on students' performance, but still their impact can not be ignored. Among all three, educational resources are having stronger correlation with performance of students in all three subjects.

Interesting explorations:¶

Correlation between different features were investigated. For example, impact of parents education on students performance, which gender performs well in which subject, impact of immigration status and family structure on performance of students, impact of family wealth, cultural possessions at home and eduactional resources at home on performance of students in all three subjects, impact of ICT use at home and entertainment ICT use at home on performance of students.

Any other new features?¶

Yes, performance of students at country level (top 10 and bottom 10). It will be interesting to evaluate further other factors in these top and bottom performing countries.

Multivariate Exploration¶

Create plots of three or more variables to investigate your data even further. Make sure that your investigations are justified, and follow from your work in the previous sections.

QUESTION: How students are performing well in all three subjects considering their Gender?

In [119]:
df_pisa_gender = df_pisa_clean.groupby('GENDER')[['MATH','READING','SCIENCE','TOTAL']].mean().reset_index()
In [118]:
df_pisa_gender_melt = pd.melt(df_pisa_gender, id_vars=['GENDER'])
In [142]:
sns.barplot(data=df_pisa_gender_melt, x="variable",y="value",hue="GENDER")
plt.title('Average score based on gender')
plt.xlabel('Subjects')
plt.ylabel('Average Score')
plt.ylim(400, 600);
In [56]:
sns.jointplot(x='MATH', y='READING', data=df_pisa_clean, hue = 'GENDER')
Out[56]:
<seaborn.axisgrid.JointGrid at 0x1cbb83c1340>

Observation: It seems that male students are performing better in Maths than female. It also seems that female students are performing better in reading than male students. Both categories are almost performing equally in science. Overall female are slightly better than male students.

QUESTION: Considering family wealth, how the gender looks in different categories?

In [57]:
sns.displot(data=df_pisa_clean, x="FAMILY_WEALTH", hue="GENDER", bins=[-6, -4, -2, 0, 2, 4])
Out[57]:
<seaborn.axisgrid.FacetGrid at 0x1cbb83283a0>
In [157]:
data09 = df_pisa_clean.sample(5000)
data09['FAMILY_WEALTH'] = pd.cut(data09['FAMILY_WEALTH'], bins=[-6, -4, -2, 0, 2, 4])
sns.countplot(x='FAMILY_WEALTH',  hue='GENDER', data=data09)

plt.xlabel('Family Wealth')
plt.ylabel('Number of Students')
plt.title('Distribution of Family Wealth by Gender')
plt.xticks(rotation = 45)
plt.xticks(ticks = [0,1,2,3,4], labels= ['Very Poor','Poor','MiddClass','Upper MiddClass','Rich'])
Out[157]:
([<matplotlib.axis.XTick at 0x1cbed142c70>,
  <matplotlib.axis.XTick at 0x1cbed142c40>,
  <matplotlib.axis.XTick at 0x1cbed142370>,
  <matplotlib.axis.XTick at 0x1cbed064c70>,
  <matplotlib.axis.XTick at 0x1cbed06e220>],
 [Text(0, 0, 'Very Poor'),
  Text(1, 0, 'Poor'),
  Text(2, 0, 'MiddClass'),
  Text(3, 0, 'Upper MiddClass'),
  Text(4, 0, 'Rich')])

Observation: It seems that more female students are under category 0 from family wealth point of view male students are more in wealthier categories.

QUESTION: Considering use of ICT at home, how the gender looks in different categories?

In [59]:
data1= df_pisa_clean
data1 = pd.melt(data1, id_vars=["ICT_AT_HOME"], value_vars=["GENDER"])
sns.displot(data=data1, x="ICT_AT_HOME", hue="value", bins = [-4, -2, 0, 2, 4])
Out[59]:
<seaborn.axisgrid.FacetGrid at 0x1cbb88d6ee0>

Observation: It seems that more male students are using ICT at home than female students.

QUESTION: How is the immigration structure looks like considering the gender of students?

In [60]:
ax = sns.countplot(x=df_pisa_clean.IMMIGRATION_STATUS, hue = 'GENDER',data=df_pisa_clean)
total = float(len(df_pisa_clean))
for i in range(len(df_pisa_clean.IMMIGRATION_STATUS.value_counts())):
    for j in range(2):
        bar = ax.containers[j].patches[i]
        height = bar.get_height()
        ax.annotate("{:.1f}%".format(height/total*100), (bar.get_x() + bar.get_width() / 2, height), ha='center', va='center', xytext=(0, 10), textcoords='offset points')

Observation: It seems that there are slightly more female native students than male students.

QUESTION: How students are categorized based on the education level their parents while considering the gender of students?

In [61]:
data2 = df_pisa_clean.sort_values(by='MOTHER_EDUC_LEVEL', ascending=True)
ax = sns.countplot(x=data2.MOTHER_EDUC_LEVEL, hue = 'GENDER',data=data2)
plt.xticks(rotation=90)
total = float(len(data2))
for i in range(len(data2.MOTHER_EDUC_LEVEL.value_counts())):
    for j in range(2):
        bar = ax.containers[j].patches[i]
        height = bar.get_height()
        ax.annotate("{:.1f}%".format(height/total*100), (bar.get_x() + bar.get_width() / 2, height), ha='center', va='center', xytext=(0, 10), textcoords='offset points', fontsize = 6)
In [62]:
data3 = df_pisa_clean.sort_values(by='FATHER_EDUC_LEVEL', ascending=True)
ax = sns.countplot(x=data3.FATHER_EDUC_LEVEL, hue = 'GENDER',data=data3)
plt.xticks(rotation=90)
total = float(len(data3))
for i in range(len(data3.FATHER_EDUC_LEVEL.value_counts())):
    for j in range(2):
        bar = ax.containers[j].patches[i]
        height = bar.get_height()
        ax.annotate("{:.1f}%".format(height/total*100), (bar.get_x() + bar.get_width() / 2, height), ha='center', va='center', xytext=(0, 10), textcoords='offset points', fontsize = 6)

Observation: It seems that female students are slightly higher in number than male students where their parents dont have any education. Similarly, majority of students where their parents are having the highest level of education are male.

QUESTION: Considering top and bottom 10 countries, how the gender looks among the students?

In [179]:
data = df_pisa_clean

# Calculate the average scores for each country
country_scores = data.groupby('COUNTRY')['MATH', 'READING', 'SCIENCE', 'TOTAL'].mean()

# Select the top and bottom 10 countries
top_10_countries = country_scores.nlargest(10, 'TOTAL').index
bottom_10_countries = country_scores.nsmallest(10, 'TOTAL').index

# Filter the data
top_countries = data[data['COUNTRY'].isin(top_10_countries)].groupby(['COUNTRY','GENDER'])['MATH', 'READING', 'SCIENCE', 'TOTAL'].mean().reset_index()
bottom_countries = data[data['COUNTRY'].isin(bottom_10_countries)].groupby(['COUNTRY','GENDER'])['MATH', 'READING', 'SCIENCE', 'TOTAL'].mean().reset_index()

# Sort the data
top_countries = top_countries.sort_values(by='TOTAL', ascending=False)
bottom_countries = bottom_countries.sort_values(by='TOTAL', ascending=True)

# Create a bar chart of the average scores for the top 10 countries
data_melt = data[data['COUNTRY'].isin(top_10_countries)].melt(id_vars=['COUNTRY', 'GENDER'], value_vars=['MATH', 'READING', 'SCIENCE'])
data_melt = data_melt.sort_values(by=['COUNTRY','value'],ascending=[True,False])
sns.barplot(x='COUNTRY', y='value', hue='variable', data=data_melt)
plt.title("Top 10 Countries: Scores by Subject and Gender")
plt.xticks(rotation=90)
plt.show()
In [180]:
data = df_pisa_clean

# Calculate the average scores for each country
country_scores = data.groupby('COUNTRY')['MATH', 'READING', 'SCIENCE', 'TOTAL'].mean()

# Select the top and bottom 10 countries
top_10_countries = country_scores.nlargest(10, 'TOTAL').index
bottom_10_countries = country_scores.nsmallest(10, 'TOTAL').index

# Filter the data
top_countries = data[data['COUNTRY'].isin(top_10_countries)].groupby(['COUNTRY','GENDER'])['MATH', 'READING', 'SCIENCE', 'TOTAL'].mean().reset_index()
bottom_countries = data[data['COUNTRY'].isin(bottom_10_countries)].groupby(['COUNTRY','GENDER'])['MATH', 'READING', 'SCIENCE', 'TOTAL'].mean().reset_index()

# Sort the data
top_countries = top_countries.sort_values(by='TOTAL', ascending=False)
bottom_countries = bottom_countries.sort_values(by='TOTAL', ascending=True)
sns.barplot(x='COUNTRY', y='TOTAL', hue='GENDER', data=top_countries, ci = None)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.title("Top 10 Countries: Average Total Scores by Gender", fontsize = 15)
plt.xlabel("Country", fontsize = 12)
plt.ylabel("Average Total Score", fontsize = 12)
plt.xticks(rotation=90, fontsize = 10)
plt.yticks(fontsize = 10)
plt.show()
In [181]:
data = df_pisa_clean

# Calculate the average scores for each country
country_scores = data.groupby('COUNTRY')['MATH', 'READING', 'SCIENCE', 'TOTAL'].mean()

# Select the top and bottom 10 countries
top_10_countries = country_scores.nlargest(10, 'TOTAL').index
bottom_10_countries = country_scores.nsmallest(10, 'TOTAL').index

# Filter the data
top_countries = data[data['COUNTRY'].isin(top_10_countries)].groupby(['COUNTRY','GENDER'])['MATH', 'READING', 'SCIENCE', 'TOTAL'].mean().reset_index()
bottom_countries = data[data['COUNTRY'].isin(bottom_10_countries)].groupby(['COUNTRY','GENDER'])['MATH', 'READING', 'SCIENCE', 'TOTAL'].mean().reset_index()

# Sort the data
top_countries = top_countries.sort_values(by='TOTAL', ascending=False)
bottom_countries = bottom_countries.sort_values(by='TOTAL', ascending=True)

# Create a bar chart of the average scores for the top 10 countries
sns.barplot(x='COUNTRY', y='TOTAL', hue='GENDER', data=top_countries, ci = None)
plt.title("Top 10 Countries: Average Total Scores by Gender")
plt.xticks(rotation=90)
plt.show()

sns.barplot(x='COUNTRY', y='TOTAL', hue='GENDER', data=bottom_countries, ci = None)
plt.title("Bottom 10 Countries: Average Total Scores by Gender")
plt.xticks(rotation=90)
plt.show()

Observation: It seeems that in top 10 countries, female students are slightly outperform male students. But looking at bottom 10 countries, female students are similary outperform male students.

QUESTION: In top and bottom 10 countries, how the immigration status of the students looks like?

In [182]:
data02 = df_pisa_clean

# Calculate the average scores for each country
country_scores = data02.groupby('COUNTRY')['MATH', 'READING', 'SCIENCE', 'TOTAL'].mean()

# Select the top and bottom 10 countries
top_10_countries = country_scores.nlargest(10, 'TOTAL').index
bottom_10_countries = country_scores.nsmallest(10, 'TOTAL').index

# Filter the data
top_countries = data02[data02['COUNTRY'].isin(top_10_countries)].groupby(['COUNTRY','IMMIGRATION_STATUS'])['MATH', 'READING', 'SCIENCE', 'TOTAL'].mean().reset_index()
bottom_countries = data02[data02['COUNTRY'].isin(bottom_10_countries)].groupby(['COUNTRY','IMMIGRATION_STATUS'])['MATH', 'READING', 'SCIENCE', 'TOTAL'].mean().reset_index()

# Sort the data
top_countries = top_countries.sort_values(by='TOTAL', ascending=False)
bottom_countries = bottom_countries.sort_values(by='TOTAL', ascending=True)

# Create a bar chart of the average scores for the top 10 countries
sns.barplot(x='COUNTRY', y='TOTAL', hue='IMMIGRATION_STATUS', data=top_countries, ci = None)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.title("Top 10 Countries: Average Total Scores by Immigration status")
plt.xticks(rotation=90)
plt.show()

sns.barplot(x='COUNTRY', y='TOTAL', hue='IMMIGRATION_STATUS', data=bottom_countries, ci = None)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.title("Bottom 10 Countries: Average Total Scores by Immigration status")
plt.xticks(rotation=90)
plt.show()

Observation: It seems that among top 10 countries, only in 5 countries, majority of studnets are native students, while in other 5 remaining, majority of students are either 1st of 2nd generation immigrants.

QUESTION: How the family wealth looks like while considering the immigration status of students?

In [88]:
sns.displot(data=df_pisa_clean, x="FAMILY_WEALTH", hue="IMMIGRATION_STATUS", kde=True, bins=[-6, -4, -2, 0, 2, 4])
Out[88]:
<seaborn.axisgrid.FacetGrid at 0x1cbdd537a60>
In [90]:
data09 = df_pisa_clean.sample(5000)
data09['FAMILY_WEALTH'] = pd.cut(data09['FAMILY_WEALTH'], bins=[-6, -4, -2, 0, 2, 4])
sns.countplot(x='FAMILY_WEALTH',  hue='IMMIGRATION_STATUS', data=data09)
Out[90]:
<AxesSubplot:xlabel='FAMILY_WEALTH', ylabel='count'>
In [156]:
 
Out[156]:
([<matplotlib.axis.XTick at 0x1cbe0e66550>,
  <matplotlib.axis.XTick at 0x1cbe0e66520>,
  <matplotlib.axis.XTick at 0x1cbe0e613d0>,
  <matplotlib.axis.XTick at 0x1cbed11a370>,
  <matplotlib.axis.XTick at 0x1cbed11aac0>],
 [Text(0, 0, 'Very Poor'),
  Text(1, 0, 'Poor'),
  Text(2, 0, 'MiddClass'),
  Text(3, 0, 'Upper MiddClass'),
  Text(4, 0, 'Rich')])

Observation: Majority of second and first generation immigrant students are between -2 and 2 category from family wealth point of view.

QUESTION: How the family structure looks like accross different categories of family wealth?

In [94]:
data10 = df_pisa_clean.sample(5000)
data10['FAMILY_WEALTH'] = pd.cut(data10['FAMILY_WEALTH'], bins=[-6, -4, -2, 0, 2, 4])
sns.countplot(x='FAMILY_WEALTH',  hue='FAMILY_STRUCTURE', data=data10)
Out[94]:
<AxesSubplot:xlabel='FAMILY_WEALTH', ylabel='count'>

Observation: It seems that majority of student with single or no parents are between 0 and -2 category from family wealth point of view. Most wealthy students are only with two parents.

QUESTION: How the cultural possessions at home looks like from gender point of view?

In [117]:
df_pisa_clean.query('CULTURAL_POSSESSIONS<0').GENDER.value_counts()
Out[117]:
Male      119650
Female    107396
Name: GENDER, dtype: int64
In [103]:
sns.displot(data=df_pisa_clean, x="CULTURAL_POSSESSIONS", hue="GENDER", bins=[-2, 0, 2])
Out[103]:
<seaborn.axisgrid.FacetGrid at 0x1cbdca9d3a0>

Observation: It seems that more female students are having cultural belongings at home in comparison to male students.

QUESTION: How students are performing well in all three subjects considering their immigration status?

In [133]:
df_pisa_immig = df_pisa_clean.groupby('IMMIGRATION_STATUS')[['MATH','READING','SCIENCE','TOTAL']].mean().reset_index()
In [131]:
df_pisa_immig_melt = pd.melt(df_pisa_immig, id_vars=['IMMIGRATION_STATUS'])
In [144]:
sns.barplot(data=df_pisa_immig_melt, x="variable",y="value",hue="IMMIGRATION_STATUS")
plt.ylim(200,700)
plt.xlabel('Subjects');
plt.ylabel('Average score values');
plt.title('Average score based on immigration stutus of students');

Observation: It seesms that second generation immigrated students are performing better in Maths than any other two groups, while slightly equal to native students in reading. Native students are performing well in Science than any other two groups. First generation immigrated students are performing the least in all three subjects.

Interesting Observations:¶

All assumptions during the univariate and bivariate assessment of data were strenghtened.

Surprising interactions:¶

More male students are owning luxury items at home than female students while female students are owning more cultural items at home than male students.

Conclusions¶

1. It appears that there is no relationship between the use of ICT at home or the use of entertainment ICT at home and the academic performance of students in these subjects. Despite the notion that spending more time watching TV or using video games at home may negatively impact students' performance in school, our data suggests that any correlation between the two is minimal. However, it is still important to consider the impact of these activities on students' physical and mental health.

2. It seems that male students consistently outperform female students in the subject of mathematics. In contrast, female students typically outperform their male counterparts when it comes to reading. Both student groups perform about equally well in the subject of science. When all three subjects are considered, it seems that female students perform marginally better than male students. This seems to show that while men and women are equally capable of understanding science, they may have different aptitudes for mathematics and reading.

3. An intriguing discovery that can be made from analysing PISA data is the correlation between a student's immigration status and their academic performance. Research has revealed that, generally speaking, students from immigrant families tend to perform less well in mathematics, reading, and science than their native-born peers. However, a closer examination of the data reveals some nuances in this trend. For instance, it has been found that second-generation immigrant students - those who are born in the host country to immigrant parents - tend to perform better in mathematics than both first-generation immigrant students and native-born students. In reading, their performance is slightly equivalent to that of native students, while in science, native students tend to perform better. On the other hand, first-generation immigrant students tend to perform the least in all three subjects, compared to native and second-generation immigrant students. This information highlights the importance of considering the specific experiences and circumstances of immigrant students, rather than making broad generalisations about their performance. It also suggests that, over time, as immigrants and their children become more acculturated and integrated into their host societies, their academic performance tends to improve.

4. An interesting finding that can be obtained from the data is the relationship between family wealth and student performance. For example, it is assumed that students from wealthy families tend to perform better than students from less wealthy families. While the data indicates that despite having a narrow positive correlation, this factor is not a key element for the success of students. Moreover, this relationship can vary depending on gender; in some cases, it is found that boys tend to benefit more from their family's wealth than girls, and vice versa.

5. An interesting finding that is obtained from PISA data is the relationship between cultural possessions and student performance. For example, data has shown that students who have access to more cultural possessions such as books, music, art or other cultural materials tend to perform better especially in Reading than students who have less access to these resources. It also showed that female student are more tend to have such cultural possessions at home than male students.

6. Another key insight that can be gleaned from PISA data is the relationship between a student's country of residence and their academic performance. It has been observed that there is a correlation between where a student lives and how they fare in the areas of mathematics, reading, and science. Interestingly, it has been found that none of the major European economies are among the top-performing countries in PISA, while the top-10 list is primarily dominated by East Asian nations. This information highlights the variations in educational systems and policies between different countries, and how these can impact student performance. It also suggests that certain regions of the world, such as East Asia, have been successful in implementing effective educational policies and practices that lead to high levels of student achievement. It is worth noting that this finding is not conclusive, as it depends on the year of the data and the PISA cycle.

7. The PISA data suggests that a large proportion of mothers have completed either general or vocational upper/post-secondary education. In terms of fathers, it appears that the majority have similar levels of education as their mothers, with a slightly lower percentage of fathers having no education compared to mothers. This indicates a strong correlation between the education levels of each parent. Additionally, the data shows that mothers and fathers who have completed ISCED 5A,6 and ISCED 3A,4 education levels have the greatest impact on the success of students in mathematics, reading, and science.

In [ ]: